# From Sentiment to Political Bias in the Arab World and the Arabic Content

The rise of political bias problem across several news anchors presents a real threat to free and independent journalism and a major factor in shifting the populace conception of the world. Several NGO’s, research centres and private organizations are working on monitoring and limiting the spread of this type of content, especially in news outlets, usually in a manual fashion.

Most of the above-mentioned effort have been centred at detecting a bias towards one end of the political spectrum (Left vs Right), and while this fits to an accurate degree the politics of Europe and the USA, this dichotomy falls short when considering the politics of the Arabic speaking world. In the MINA region, politics are extremely more convoluted, with different parties appealing to a different aspect of the people life including religion, nationality, and traditional ethos. Furthermore, the various religious, racial, and national splits in these communities are explicitly utilized by the politicians.

Following our mission of building a reliable news source and battling Fake news, we, at ALMETA, have devoted our effort to build an automated political bias detection system, that can process thousands of articles and flag out potentially biased content this article gives a general outline one of the factors in our algorithm namely, emotional bias.

To better understand the design of the system we should first look at how the process is done manually. The manual assignment of political bias is usually based on 4 main metrics:

2. Factual/Sourcing – Does the source report factually and back up claims with well-sourced evidence.
3. Story Choices: Does the source report news from both sides or do they only publish one side.
4. Political Affiliation: How strongly does the source endorse a particular political ideology? In other words how extreme are their views.

In this article, we will show how we simulated the first metric.

## Entity-level Sentiment Analysis

It is not good enough to find the loaded wording in the article we are processing, it is equally important to finding the target of this polarity (hate or praise). We have already developed an entity-level sentiment analysis system. Another blog about this one is coming. But in a nutshell, our system can detect meaningful targets in the text and assign to each of them a polarity. Both the target detector and polarity assignment models returns their confidence for each one of the targets.

## How to model bias based on polarity

The simple intuition we used is that an article that excessively praises or criticizes a certain entity is a biased article, mainly because most of the things in life most of the time tend to be just normal. Therefore, we opted to use the confidence of our entity-level sentiment analysis to create a measure of article subjectivity.

### Show me the math!

Based on our assumption the unnormalized sentiment model confidence (coupled with the unnormalized target detection confidence) can give an estimate of emotion towards a target as follows:

where $S_{conf}$ is the NMLL (Negative Marginal Log Likelihood) confidence of the sentiment model, $T_{conf}$ is the NMLL confidence of the Target model. This metric stays the same regardless of the assigned polarity and therefore, can be seen as a subjectivity metric of a certain target t. Note that for a Conditional Random Field (CRF) model with input X and output Y the NMLL score of a sub-sequence K from X to be tagged as a certain class is calculated as follows:

where k is a token in the sequence K and $P(Y k=class)$ is the marginal probability of the token k to be assigned the class, this marginal probability is estimated using the forward-backwards routine in the CRF calculation. Based on this both $S_{conf} , T_{conf}$ are bounded from above by 0 and from
below by $-K * log(eps)$

To create a metric of the whole article we can use the sum of exponents of the previous subjectivity score across all the targets found by ELSA model, and to account for the variation in the number of targets per article which correlates with the article size we can normalize the sum by the number of words in the article, therefore the article score becomes:

However, in this type of analysis, we are assuming a uniform distribution of the scores. And here is where the second assumption comes in hand. The best way to visualize is to imagine a histogram of the news articles. Ideally, the histogram of the scores will be similar to the following figure where most of the article has low subjectivity with the number of articles falling with the increase of subjectivity.

Such a histogram would mostly be generated from a gamma distribution (red curve in image). By assuming that the scores of the articles follow a certain probabilistic distribution we can use its CDF (cumulative density function) to get a normalized scale that adheres to the second assumption.
For example, the following graphs illustrate the PDF and CDF for gamma distribution. Note how that many values on the linear scale especially near the long tail of the distribution are all mapped to the same probability, (basically of the case of the red pdf [k = 1.0, theta=2.0] all the values above
8 are mapped to a probability of nearly 1 in the CDF), this shows the difference between using a linear scale and a nonlinear one (as in a linear scale the above-mentioned values will occupy the probability range from 0.4 to 1.

Therefore the CDF probability of the articleScore can represent a good way to normalize the original article score.

Furthermore, the same analogy can be used if the scores are not distributed following gamma, since the CDFs across all of the probabilistic distributions share this favourable feature. Even in the case of a multi-headed histogram, where the data is generated from a combination of distributions, we can approximate the latent distribution using Mixture Models (for example GMMs) and then map the scores using the GMMs CDF.

### How can I implement this?

1. Collect a large set of news articles (we already have this)
2. Run the model and calculate the unnormalized article score for each of them
3. Find the histogram of the unnormalized scores and fit a distribution to it (this can be any distribution or mixture of them but ideally this should be Gamma)
4. Use the CDF of the fitted PDF to get a normalizer of the scores this can be:
1. A closed-form solution: after fitting to a certain distribution use that distribution CDN equation to get the probability
2. A numerical solution: which is suitable for weird distributions like GMM, then we will have to store a discrete CDN as an array and use it to calculate the prop.
5. While the app works online Aggregate the unnormalized scores to create new sets and periodically re-adapt the distribution to the new data points (using maximum a posterior Algorithm for example)

### How did we implemented it?

At ALMETA we host a plethora of datasets among them a sizable corpora of Arabic News articles that we aimed to use to create our metric. However, applying the aforementioned methodology using this corpora is unfeasible because of its shear size, so before we can start calculating probabilities we needed to create a smaller (yet informative enough) dataset.

#### Selecting a informative dataset

The goal of this step is to select a smaller yet informative subset. For this subset, we will calculate the unnormalized articles scores and then fit them to a probability distribution.
the following figure shows the histogram of articles based on the word count in their text, Note that nearly
no articles have more than 2000 words, furthermore, we see that there is a sizable number of very short articles these includes:

• Video descriptions mainly from BBC
• Articles where the text is the same as the title (Aljadeed)
• Error in the crawler where only part of the text was retrieved (Alarabia)

To elevate this only articles with word count between 60 and 2000 words were considered, this figure shows the new distribution.

Next, we tried to get as many different topics as possible to do so we used the genre assigned by the author, Note that although this is a manual tag this classification is rather fuzzy and not strict, with several articles being erroneously and ambiguously classified, the following figure shows the distribution of articles based on genre. It is easy to see that many articles relate to the same meaning (journalism, press, inthepress) this is caused by the different naming conventions across the news domains these
articles should be combined in a single bucket, furthermore, many articles have a problematic genre:

• Due to errors in crawling (a lot of articles have numerical genres like 4667654321 or even hashes)
• Some articles have weird manual genres (year2013, 1300GMT)

To elevate these issues articles with the problematic genre were discarded and a mapping between the genre and topics of the site was created manually please Note that the building of this mapping was not accurately aimed at creating a topic classification set but to rather diversify the selected set as much as possible, and that the assignment of a certain genre to a common topic was based on reviewing a small random set of that genre articles. Following is the distribution of articles based on the new set of topic.

In order to select an informative (yet smaller) set from these articles, from every “topic” we sampled N random articles were N is:

\$latex N=min(topicSize ,max(0.7∗topicSize ,100)) \$

where topic size is the number of articles assigned to a certain topic, after this step the size of the selected dataset is 73294 articles, which while is still big is computationally attainable.

#### Data Modeling

After selecting the dataset the articles are passed to the system to calculate the article score. The following figure shows the distribution of articles scores note that as expected the distribution resembles a gamma CDF with
most of the articles having a low bias score.

To find the best fit we tried finding the distribution that fits the data with lowest SSE (sum of squares error) which turned out to be Gaussian following figure shows the PDF and CDF of this distribution, However, using a Gaussian violates the constraint that scores are greater than zero this can be easily seen from the fact that CDF of that distribution gives a considerable weight to negative scores (the probability of CDF of zero score is 5% instead of zero)

Several families were tested manually and we settled with a gamma distribution the final figure shows the PDF and CDF of this distribution, Note how in compassion with the Gaussian CDF the Gamma CDF gives nearly no weight to negative scores (the probability of CDF of
zero score is 0.04%)