Automatically Extracting Valuable Content from News Streams.

Almeta News, as a content aggregator app from ~ 50 sources – to the time we wrote this blog post – always aims to provide its users with the best quality pieces to read. Rather than an army of content watchers and editors, Almeta is looking forward to developing the best algorithms to review content automatically, looking for indicators of quality, assessing a content’s placement. This post is a part of our research efforts seeking the best content ranking methodology.In this post, we’re trying to determine the most effective indicators of content quality. Depending on human experts’ points of view, … Continue reading Automatically Extracting Valuable Content from News Streams.

An Initial, Failed Solution For The Event Detection Task

In this post, we are trying to validate our initial solution for the event detection task. If you’re not familiar with the task you can refer to our previous post about “How to” Event Detection in Media using NLP and AI. For applying event detection in news articles we are planning to do the following: Represent each article as a vector of expressive features. Feed the vectorized articles into a sequential clustering model to aggregate the ones talking about the same event. In this post, we’re solving the problem of event detection in the Arabic language. Features Effectiveness In this … Continue reading An Initial, Failed Solution For The Event Detection Task

Initial Genre Classification Experiments

The ability to filter your news feed based on the genre is a critical component of any news aggregator, users would usually want to read sports or political news only not just the most recent or hottest news. In this post, we will explore in great details our initial genre classification system. Let’s start with the.. data In the following experiments, we used an in-house data set. The data set is composed of 190307 HTML document crawled from the following domains [Aljazira, Alarabia, Aljadeed, RT Arabic, BBC arabic]. For each of the documents we tried to extract the following features: … Continue reading Initial Genre Classification Experiments

Building a Test Collection for Event Detection Systems Evaluation

Before we start, if you’re not familiar with the Event Detection task in NLP you can refer to our previous post on this topic here. So you’ve built a system to detect events in the media… now what? While building a system is a key step, how the system performs on real-world data has equal importance. We need to know whether it actually works and if we can trust its decisions. So.. we need to evaluate our system before putting it in use. Evaluation is a highly important step in the development of any system type as it allows the … Continue reading Building a Test Collection for Event Detection Systems Evaluation

News Stream Clustering – Sequential Clustering in Action

In a previous post, we talked about “How to” Event Detection in Media using NLP and AI. In another post, we presented the Sequential Clustering. Today we’re introducing an online (sequential) clustering algorithm specialized in aggregating news articles into fine-grained story clusters. Problem Formulation We focus on the clustering of a stream of documents, where the number of clusters is not fixed and learned automatically. We denote by D (potentially infinite) space of documents. We are interested in associating each document with a cluster via the function C(d) ∈ N, which returns the cluster label given a document. For each … Continue reading News Stream Clustering – Sequential Clustering in Action

An Implementation of a News Stream Sequence Clustering Algorithm

In a previous post, we discussed a News Stream Sequential Clustering Algorithm. In this post, we’re discussing the details of implementing this algorithm with minimal tuning, and showing the results produced by this implementation. Along with this post, we’re evaluating … Continue reading An Implementation of a News Stream Sequence Clustering Algorithm

Contrary view detection based on VODUM

While reading the news each one of us perceives it in a different manner. We have our own biases and we tend to search for information that confirms our previous beliefs. Thus different people might have drastically different viewpoints of … Continue reading Contrary view detection based on VODUM

From Sentiment to Political Bias in the Arab World and the Arabic Content

From Sentiment to Political Bias in the Arab World and the Arabic Content

The rise of political bias problem across several news anchors presents a real threat to free and independent journalism and a major factor in shifting the populace conception of the world. Several NGO’s, research centres and private organizations are working … Continue reading From Sentiment to Political Bias in the Arab World and the Arabic Content

How to Rank Articles Based on How Informative They Are

How to Rank Articles Based on How Informative They Are – Using Snorkel

Let’s start with a simple question, what constitutes an informative article? based on Oxford’s dictionary. informative/ɪnˈfɔːmətɪv/ adjective: informativeproviding useful or interesting information However, this is still an abstract concept. Yes, it is much simpler to flag an article as spammy … Continue reading How to Rank Articles Based on How Informative They Are – Using Snorkel

Informativity Detection, Our Research Gist

Informativity Detection – Almeta’s Research Gist

Let’s start with a simple question, what constitutes an informative article? based on Oxford’s dictionary. informative/ɪnˈfɔːmətɪv/ adjective: informativeproviding useful or interesting information However, this is still an abstract concept. The question of measuring How informative a piece of news is … Continue reading Informativity Detection – Almeta’s Research Gist

What Makes an Article Informative and How Computers Can Measure Informativeness

What Makes an Article Informative – And How Computers Can Measure Informativity of a Text Content

The Concept of an informative text is really abstract and it is hard to come up with a definitive formula to measure it, in this article we will explore some of the features that we believe can make an article … Continue reading What Makes an Article Informative – And How Computers Can Measure Informativity of a Text Content

Analysis of the Readability Metric Results in Almeta News Feed

In this post, we’re analyzing the results returned by the readability metric in our news feed. If you haven’t checked our post about “How to measure the readability of a text?” before, you can read about it here. How Are We Measuring the Readability? The main part of analyzing a metric is to know how does it work. In the current version, we’re depending on the AARIBase metric for measuring the readability. So, let’s have a look first on how does AARIBase work. Here’s the AARIBase formula: AARIBase = (3.28 × NOC) + (1.43 × ACW) + (1.24 × AWS) … Continue reading Analysis of the Readability Metric Results in Almeta News Feed

Clickbait Detection Using Word2Vec Representation

In a previous article, How to Detect Clickbait Headlines using NLP? We introduced the task of clickbait detection and explored how it can be modeled within the domain of machine learning and NLP. If you are not familiar with the concept of clickbait detection, make sure to review it before continuing. In this post, we’re building a classifier for clickbait detection in the news headlines depending on a pre-trained Arabic Word2Vec model and we’re validating this solution. If you are not familiar with the Word2Vec concept you can refer to this Wikipedia article for more information. News Headlines Representation In … Continue reading Clickbait Detection Using Word2Vec Representation

Google’s AutoML Overview

In this post, we are exploring how Google’s AutoML can help us in Almeta in developing automatic Arabic language processing tools. Before start if you are not familiar with the term AutoML you can refer to our previous post on this topic. Who is Google AutoML for? and When to Use It? The targeted audience by Google’s cloud autoML are people who have limited knowledge in machine learning. The main goal of this cloud service is to let the user build his own AI model that is tailored to his business needs, if the provided services by Google’s AI API … Continue reading Google’s AutoML Overview

How to Fact-Check using Natural Language Processing Techniques? A Literature Review

In this article, we present the summary of our research in the field of fact-checking. We categorized them in two categories, first are the closed source published applications and the second are the research projects done in this field. Closed Source Snobs Their methodology depends on human annotators to fact check a piece of the news and present a detailed report regarding the inaccuracies in the article Reporters’ Lab Their methodology depends on human annotators as well, and dataset can be found in https://www.politifact.com/texas/ and https://factnameh.com/ Fullfact Their methodology builds a fully automated fact checker, but no details are provided … Continue reading How to Fact-Check using Natural Language Processing Techniques? A Literature Review