The rise of social media has undoubtedly changed the way marketing work, not only does it allow the companies to easily access massive numbers of potential customers but it also allow them to segment the market, figure out what is their current customers like or dislikes about their product, find out about competitors threats or opportunities and gaps in the markets.
The main focus of social NLP research in the past years was reaping this information from the social media and presenting hem in a simple and actionable format, you can see our previous post about types of social NLP services. However, in this piece, we will explore a different approach, a product that service that many marketeers dream of possessing.
In marketing campaigns marketers would start their campaign based on their view of the market and the customers, this view might be guided with stats and previous experience but it remains the view of these marketeers, after lunching the campaign the company would usually decide on a set of metrics to measure the ROI on this campaign, in case of social media marketing these metrics mostly focus on the growth in users engagement and company reach, However marketing remains an art and it is extremely hard to accurately estimate the ROI of the campaign before it is launched.
In this article we are discussing the feasibility of estimating the social impact of a marketing post before it is posted.
What Is The Task
The easiest way to define this task is to separate it into the following steps, Note that all of these steps are carried out in the realm of social media:
- First, a marketer needs to define the customers segment this post is targeting, this targeting step can be based on several factors like age, gender or interests and while it can be done manually the better way to achieve this is by using users profiling to figure out non-trivial customer segments such as elderly users in the west coast who uses android phones.
- After deciding on the targeted audience the marketer and the company usually agree on a set of metrics to evaluate the impact of the campaign, many of these metrics require social NLP and data mining capabilities to calculate.
- The next step is designing the logo/posts of the campaign, in this step, the ideal system we are describing should have the ability to predict the impact of this post on selected users group as measured by the agreed-on metrics so the output would be something like “this image would increase the reach of the company across the Canadian customers by 3% ”
- furthermore, it would be extremely helpful if this system can suggest tweaks that can improve the performance of the post.
The first step of the process is out of scoop of this article However there are many services online that can provide insights into customers and market using social media analysis you can check our article on types of NLP services in social media, for the second you can review our piece on different metrics to assess your social media campaign performance. The main focus of this article will be the third step and we will briefly discuss approaches to implement the forth step.
Challenges In Estimating Social Engagement
For service as important as this there is a shockingly low amount of research on this area, as you will see next we have explored several approaches designed for other tasks but can be reused for estimating social engagement.
Furthermore, there is only one company that we know of that address this task in the domain of NLP is pi from post Intellegence .
However, this system is not revealed yet so we can’t accurately assess its performance. This can give you a hint at the complexity of the task.
The main challenge of applying machine learning to estimate post engagement from text is the fact that these textual posts usually cover 2 different axes:
- The first being the message of the company, this can be either an offer, a special deal or simple marketing message and announcements
- The style in which these messages are presented
The main problem arises for the fact that it is hard to separate these 2 axes in the post mainly because their impact on the post popularity would vary greatly.
For example, a post that presents a generous offer from a company but is presented in a mediocre way can have much more impact than a simple announcement message presented in a perfect style.
This can generate ambiguity in the training process of machine learning systems.
How Can We Predict The Social Engagement
In the following section, we will list the different approaches we reviewed to estimating the social engagement.
One thing to note is that these approaches implementation might vary slightly based on the targeted social platform, targeted audience, and selected metrics.
Social Engagement Regression
In this task, the system is trained to predict the exact metric value (e.g. the exact number of likes a post might get).
The only work we found that tackles this task is  the authors used Support Vector Regression (SVR), Echo State Network (ESN) and Adaptive Network Fuzzy Inject System (ANFIS) to predict the number of (likes/shares/comments) a Facebook post would receive, However, the details of the paper is extremely ambiguous since the authors do not detail their data creation method nor their evaluation details
Social Engagement Classification
In this task, the goal is not to directly estimate the engagement metrics (the number of likes for example) but to rather predict the range of engagement (high, low, mid) engagement.
The task of classification is usually far simpler than direct regression.
In  the authors tried to predict the level of citizens engagement (low/high) with UAE government tweets, they used textual and social features and tested 3 different models (naive bias, random forests and ADA-boost) however the reported results on the prediction task is very low with the best f-score being 0.673. it noteworthy that this is the only approach we found in Arabic.
In  a similar approach is followed to predict if a Facebook post would have (above average/ less than average) engagement. The used a very similar set of features and models to  and their results are also similar their highest f1 score is reported for the task of total reactions prediction with 0.69, this result is still arguably low. the authors publish their dataset freely
Brand-Wide Performance Prediction
In this task the goal is not to predict the performance of a single post but to rather predict the future performance of the whole social media account, the goal of such an approach includes forecasts of the account popularity as well as having the ability to detect potential social influences (content creators that currently have low social media performance but are prone to grow in the future). The latter goal is important for connecting companies with marketeers.
An example of this research is  in which the authors use the current stats of the account (#likes, shares, …) to estimate the future metrics values. However, they follow a rather trivial approach in their choice of features and models and they report a mean absolute percentage error MAPE of 67% for some metrics, their best metric achieved a MAPE of 27%. this level of performance renders the system unusable
Engagement In News Articles
Authors in  discussed a simpler version of social engagement prediction, they address this prediction task as a two-steps classification problem: predict if news articles will receive comments and if they do, if the number of comments will be high or low, they trained a random forest classifier on both textual and social features using a sizable dataset of dutch news articles.
 took a step further in trying to estimate ranges of popularity, they report an accuracy of 84% when identifying articles that would receive a small, medium, or large number of tweets.
Social Engagement In The Vision Domain
In this variation the input of the system can be the campaign logo, the stock images of the posts, etc while the system should predict the popularity and impact of using this particular image, surprisingly there is some research  on implementing this task, there are even some companies that provide such tasks like likelyAI or dataSine, such a service can be particularly important for specific platforms like Instagram.
The main advantage of using images rather than text as the input signal is that it can surpass the challenge mentioned earlier of separating the message from style, mainly since many of the marketing campaigns uses images to convey very simple messages while relaying on the visual impact of the image as an attention-grabbing mechanism.
TeraData have a very cool presentation on how such a system can be implemented. This task can also be easily extended to the video domain however we didn’t come across any work discussing this task.
In  the authors combined visual, textual and social features to estimate the polarity of a post in fashion site similar to Instagram, the authors report their results on both regression task where exact counts are to be estimated as well as top k% classification, in which the system needs to predict if the post is into k% of the posts in a certain interval, they used 2 values for k 25% and 75%. their reported results on the classification is decent reaching up to 88% accuracy while their regression values aren’t.
In  similar features are used in deep Temporal Context Networks, the authors report an improvement of 27% relative on the regression task with a Pearson metric value of 0.62, the authors also publish their data.
Finally, this is really cool post about engagement prediction from images, the author applies a very detailed experiment to measure the impact of different features and models on the performance of such systems.
Social Engagement After Publication
The alternative to tackle the challenging task of pre-publication perdition is to include in the prediction model data about the attention that one item receives after its publication.
A common approach is to deduce future content popularity using the aggregate users’ attention after the publication of web content. The methods under this category have been used to predict web content popularity based on the aggregate users’ attention received early after content publication.
Due to the simplicity of this task, there is a relatively large volume of research on this issue, we will briefly go through the main approaches
- Using mathematical models: early works   used mathematical models (usually linear) to study the relation between the size attention (e.g. a number of comments) a story received early on in its life span, and the final size of attention that it will get.
-  also trained a modified logistic regression to estimate this relation, while  used a simple linear regression to achieve this task with comparable performance.
- Classification of future popularity into say: three classes (low, mid, high) is a far simpler issue  show that by looking at the number of page views of a blog post in the first 30 minutes, one can classify articles in three classes of popularity with 86% accuracy. In  the authors find that By training different classification methods the results indicate that it is possible to predict the popular class of a Digg story with an accuracy of 80%, 64%, and 45% when separating stories in 2, 6, and 14 ranges of popularity.
- For web content that captures users’ attention for longer periods of time (e.g., certain videos that are viewed during several months or even years) the previous methods can have large prediction errors mainly because they don’t have temporal axis, i.e they only take in consideration how much attention did the post attracted but not the period on which it did,
- To improve the prediction effectiveness, one solution is to design models that can weight users’ attention differently based on the recency of the information relative to the prediction moment.
- One approach is to split the overall attention into intervals and create a vector of attention-per-interval e.g. comments-per-day and then train multi-variate models to predict the final attention of post, for example  utilize this representation to train a multivariate linear regression to predict the number of views of a YouTube video.
- However, we believe that using recurrent neural networks to predict this task is a much simpler and effective solution. In this formation, the problem becomes a time-series forecasting task, where, given the last N observations (views per day) the model must predict the number of views in the next day, (or after M days). For example  uses the classical time-series forecasting models like ARMA (Autoregressive Moving Average) to forecast the popularity of YouTube videos.
Social Engagement As A Recommendation System
One way to look at social engagement is by viewing each post as a merchandise item and every user as a potential customer, the process of engaging with a post represents the user buying this item, using this formulation classical recommendation system can be used to find which users are likely to interact with a specific post.
Most of the approaches follow a collaborative filtering approach where the historical interactions between users and entities is the main information in predicting future interactions.
In  the authors applies this on the task of predicting number of likes on Flickr posts, mainly by predicting which user are more likely to like the post. Their approach uses Point-wise Mutual Information (PMI) which derives users latent similarities from their interactions log and exploits them to predict future interacting users. The proposed method is evaluated using a large dataset of Flickr including 2.3M users and 11.2M published photos.
The follow-up work  studied the possibility of building users embedding based on their historical interactions and then using these embeddings to predict future interactions.
The main disadvantage of this issue is that it needs to be implemented after publication.
In this variation the system extract data from one domain (e.g., social media) and transform it into knowledge to predict web content popularity in another domain (e.g., the site where content was published). By definition, this type of tasks can only predict the popularity after publication.
- In  the authors use data from Twitter and YouTube to predict movie ratings on IMDb
- Social Transfer algorithm  , extracts information from Twitter to detect videos that will experience sudden bursts of popularity on YouTube. The model consists of the following steps:
- Extract popular topics from Twitter,
- Associate these topics to YouTube videos,
- Compare the popularity of videos on Twitter with their popularity on YouTube.
- A disproportionate share of attention on Twitter compared to YouTube is then used as strong evidence that a video will experience a sudden burst of popularity
- This same approach can be advantageous in estimating the traffic volume of non-social sites that have a social media presence, for example in  the authors utilize several features including a number of Facebook shares, number of tweets and retweets, the entropy of tweet vocabulary, and the mean number of followers sharing the articles on Twitter. To predict the traffic size a news story from Aljazeera will receive. However the ability to extend this behavior to other news outlets might be questionable since the inherent popularity of the news anchor can influence it’s social media dynamics, for example Aljazeera news stories are far more likely to be shared than a small outlet stories. While small outlets might relay on spammy stories to increase their traffic.
These are tasks that do not directly address the issue at hand but can be easily re-used for it
Churn Detection In Social Media Networks
This is a variation of the churn detection task implemented for social media sites, churn detection is the task of identifying the users with the highest probability to leave a merchant, This task can also be formulated as scoring the users based on their engagement.
The main difference between our proposed task and churn detection is that while predicting engagement is usually based on a stimuli (the company post) while churn detection estimate user engagement in general.
The potential applications of such services grows beyond just social media sites to cover other elements like MMO RPG games that usually have social elements to it, and can even be extended to measure the churn rate for social media pages, or companies accounts on social platforms.
It is possible to use this task to predict some social metrics like negative and positive feedback.
There is surprisingly a long line of research in this specific area, the tasks can be split into 4 main categories, we will gloss very quickly across them,  present a good survey of this field:
- Engagement as an activity performed over time: The most common form of engagement in this type of research is engagement as an activity performed over a certain period of time. Activities may include posts, These studies observe a user’s activities over a period of time and then try to predict the user’s activity level during the following period of time. The line of research include   
- Engagement as certain amount of activity performed: Engagement is also commonly defined using the number of activities performed, regardless of time, e.g.  considered a user to be engaged in the Occupy Wall Street movement based on two different criteria: how much a user retweeted the official @OccupyWallSt account, and how much they posted tweets with an Occupy Wall Street hashtag (#OWS).
- Engagement as loyalty:  tried to predict user loyalty to a community on Reddit, which they define as user preference of a community or a forum over others. This was done by attempting to predict whether a user makes the majority or minority of their posts to a certain forum in a multi-forum site at a specific time.
Ads Click-Through Rate Prediction
This task is particularly researched by search services that provide paid ads, in a very simple terms in this task the system takes as input the search query and an ad and tries to predict how many clicks would be generated to this ad, other implementations accept other features such as users search history and users model, the goal is to suggest ads to the right users and the right queries in order to maximize CTR for these ads.
In contrast to social engagement, CTR maximization depends on presenting the ads to the appropriate users rather than suggesting modifications to the ad.
There is a long line of research on this task but it falls outside this article scoop (Not really there are some interesting leads but I ran out of time) , examples of this research include    you will notice that most of the research is done by content providers such as Microsoft, Yahoo and Alibaba
In this task, the goal is identifying the users with strong social media impact,
This task can help companies achieve 2 main goals:
- finding content creators that can pitch the company messages effectively and thus can improve its social image
- finding customers with larger social media impact and prioritizing user service based on this
This task can be rendered to find content providers that present worthy content (have real social presences ) this can be utilized to avoid the impact of likes farms, the other option is to find potential future influencers in  the authors made a model using machine learning techniques to classify reviewers into high/low popularity based on their profile characteristics. Based on this work, businesses can identify potentially influential reviewers to request them for reviews in order to increase the popularity of product. However, in usual cases simple metrics such as likes and shares counts of user profile to measure its social influence.
There is a full task in NLP to estimate how persuasive a certain piece of text is, this is especially valuable for marketers, for example  present a “cheap and fast” methodology for measuring the persuasiveness of communication-based on google Ad words, again this field is out of the scoop of this article. (again it isn’t I just didn’t have time to dive into it)
Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from google play: https://play.google.com/store/apps/details?id=io.almeta.almetanewsapp&hl=ar_AR”
 E. Sam, S. Yarushev, S. Basterrech, and A. Averkin, “Prediction of Facebook Post Metrics using Machine Learning,” ArXiv Prepr. ArXiv180505579, 2018.
 N. Siyam, O. Alqaryouti, and S. Abdallah, “Mining government tweets to identify and predict citizens engagement,” Technol. Soc., p. 101211, 2019.
 É. S. Rosas-Quezada, G. Ramírez-de-la-Rosa, and E. Villatoro-Tello, “Predicting consumers engagement on Facebook based on what and how companies write,” ArXiv Prepr. ArXiv190909914, 2019.
 S. Moro, P. Rita, and B. Vala, “Predicting social media performance metrics and evaluation of the impact on brand building: A data mining approach,” J. Bus. Res., vol. 69, no. 9, pp. 3341–3351, 2016.
 M. Tsagkias, W. Weerkamp, and M. De Rijke, “Predicting the volume of comments on online news stories,” in Proceedings of the 18th ACM conference on Information and knowledge management, 2009, pp. 1765–1768.
 R. Bandari, S. Asur, and B. A. Huberman, “The pulse of news in social media: Forecasting popularity,” in Sixth International AAAI Conference on Weblogs and Social Media, 2012.
 F. Gelli, T. Uricchio, M. Bertini, A. Del Bimbo, and S.-F. Chang, “Image popularity prediction in social media using sentiment and context features,” in Proceedings of the 23rd ACM international conference on Multimedia, 2015, pp. 907–910.
 K. Yamaguchi, T. L. Berg, and L. E. Ortiz, “Chic or social: Visual popularity analysis in online fashion networks,” in Proceedings of the 22nd ACM international conference on Multimedia, 2014, pp. 773–776.
 B. Wu, W.-H. Cheng, Y. Zhang, Q. Huang, J. Li, and T. Mei, “Sequential prediction of social media popularity with deep temporal context networks,” ArXiv Prepr. ArXiv171204443, 2017.
 A. Kaltenbrunner, V. Gomez, and V. Lopez, “Description and prediction of slashdot activity,” in 2007 Latin American Web Conference (LA-WEB 2007), 2007, pp. 57–66.
 G. Szabo and B. A. Huberman, “Predicting the popularity of online content,” Available SSRN 1295610, 2008.
 A. Tatar, J. Leguay, P. Antoniadis, A. Limbourg, M. D. de Amorim, and S. Fdida, “Predicting the popularity of online articles based on user comments,” in Proceedings of the International Conference on Web Intelligence, Mining and Semantics, 2011, p. 67.
 S.-D. Kim, S.-H. Kim, and H.-G. Cho, “Predicting the virtual temperature of web-blog articles as a measurement tool for online popularity,” in 2011 IEEE 11th International Conference on Computer and Information Technology, 2011, pp. 449–454.
 S. Jamali and H. Rangwala, “Digging digg: Comment mining, popularity prediction, and social network analysis,” in 2009 International Conference on Web Information Systems and Mining, 2009, pp. 32–38.
 H. Pinto, J. M. Almeida, and M. A. Gonçalves, “Using early view patterns to predict the popularity of youtube videos,” in Proceedings of the sixth ACM international conference on Web search and data mining, 2013, pp. 365–374.
 G. Gürsun, M. Crovella, and I. Matta, “Describing and forecasting video access patterns,” in 2011 Proceedings IEEE INFOCOM, 2011, pp. 16–20.
 S. Mohammadi, R. Farahbakhsh, and N. Crespi, “Who Will Like the Post? A Case Study of Predicting Likers on Flickr,” in 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2018, pp. 35–42.
 S. Mohammadi, R. Farahbakhsh, and N. Crespi, “User Reactions Prediction Using Embedding Features,” in 2018 IEEE Global Communications Conference (GLOBECOM), 2018, pp. 1–6.
 A. Oghina, M. Breuss, M. Tsagkias, and M. De Rijke, “Predicting imdb movie ratings using social media,” in European Conference on Information Retrieval, 2012, pp. 503–507.
 S. D. Roy, T. Mei, W. Zeng, and S. Li, “Towards cross-domain learning for social video popularity prediction,” IEEE Trans. Multimed., vol. 15, no. 6, pp. 1255–1267, 2013.
 C. Castillo, M. El-Haddad, J. Pfeffer, and M. Stempeck, “Characterizing the life cycle of online news stories using social media reactions,” in Proceedings of the 17th ACM conference on Computer supported cooperative work & social computing, 2014, pp. 211–223.
 G. Dror, D. Pelleg, O. Rokhlenko, and I. Szpektor, “Churn prediction in new users of Yahoo! answers,” in Proceedings of the 21st International Conference on World Wide Web, 2012, pp. 829–834.
 F. Sadeque, T. Solorio, T. Pedersen, P. Shrestha, and S. Bethard, “Predicting continued participation in online health forums,” in Proceedings of the Sixth International Workshop on Health Text Mining and Information Analysis, 2015, pp. 12–20.
 M. Milošević, N. Živić, and I. Andjelković, “Early churn prediction with personalized targeting in mobile social games,” Expert Syst. Appl., vol. 83, pp. 326–332, 2017.
 J. Chen and P. Pirolli, “Why you are more engaged: factors influencing twitter engagement in occupy Wall Street,” in Sixth International AAAI Conference on Weblogs and Social Media, 2012.
 W. L. Hamilton, J. Zhang, C. Danescu-Niculescu-Mizil, D. Jurafsky, and J. Leskovec, “Loyalty in online communities,” in Eleventh International AAAI Conference on Web and Social Media, 2017.
 N. Hudson, H. Khamfroush, B. Harrison, and A. Craig, “Click Maximization in Online Social Networks Using Optimal Choice of Targeted Interests,” ArXiv Prepr. ArXiv191102061, 2019.
 B. Edizel, A. Mantrach, and X. Bai, “Deep character-level click-through rate prediction for sponsored search,” in Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2017, pp. 305–314.
 G. Zhou et al., “Deep interest network for click-through rate prediction,” in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1059–1068.
 S. Bhattacharyya, S. Banerjee, and I. Bose, “Predicting online reviewer popularity: A comparative analysis of machine learning techniques,” in Workshop on E-Business, 2016, pp. 22–28.
 M. Guerini, C. Strapparava, and O. Stock, “Evaluation Metrics for Persuasive NLP with Google AdWords.,” in LREC, 2010.