Initial Genre Classification Experiments

The ability to filter your news feed based on the genre is a critical component of any news aggregator, users would usually want to read sports or political news only not just the most recent or hottest news. In this post, we will explore in great details our initial genre classification system. Let’s start with the.. data In the following experiments, we used an in-house data set. The data set is composed of 190307 HTML document crawled from the following domains [Aljazira, Alarabia, Aljadeed, RT Arabic, BBC arabic]. For each of the documents we tried to extract the following features: … Continue reading Initial Genre Classification Experiments

Intial Experiments on Measuring Informativity of an Arabic Content – Data Collection

In one of our previous articles we suggested a method to build an initial system for informativeness detection, this system should utilize a small set of pairwise comparisons manually annotated and use Snorkel to expand these annotations automatically to a larger training set and then train the model to estimate the article informativeness using this set.In this article, we will go into the details of the implementation of this plan. Data Annotation As noted above Snorkel will need 3 typed of training data: A small manually annotated test set to evaluate the results of the model A smaller manually annotated … Continue reading Intial Experiments on Measuring Informativity of an Arabic Content – Data Collection

Git submodules in the python world Why and How

The basic principle that makes many professional tech companies professional is the simple principle of domain engineering. Basically working for a long period of time on a small set of domains with the hope that you will grow your codebase to be more efficient and successful in developing projects from these domains. the main component in this formula is the idea of code reuse. Sooner or later you will have a certain piece of code that you will use constantly across all your projects, if we are talking about NLP these might be your text normalizers your features extractors or … Continue reading Git submodules in the python world Why and How

What is Political Bias? – In Technical Terms

In this article, we will review all the researches done in the field of discovering political bias. Understanding Characteristics of Biased Sentences in News Articles Methodology Bias Labeling via Crowd-Sourcing They used crowdsourcing to collect bias labels using “Figure Eight” platform. In crowdsourcing they let the workers make judgements on each target news article (using also the reference news article). Analysis of Perceived News Bias To analyze what kind of words are tagged as bias triggers by the workers: they analyze the phrases annotated as biased in terms of the word length (4 words in a sentence have been annotated). … Continue reading What is Political Bias? – In Technical Terms

Available Visualization Libraries to Handle Stream of Data

Google Analytics Google Analytics generates detailed statistics and fresh insights into your website’s traffic and traffic sources. With Google Analytics users can track visitors from all referrers, including search engines and social networks, direct visits and referring sites. It also tracks and monitors display advertising, PPC networks, email marketing and other digital collateral. You can not only measure sales and conversions, but also gain fresh insights into how visitors use your site and how you can keep them coming back. Segment From startups to the Fortune 500, thousands of companies use Segment as their customer data hub. We believe that … Continue reading Available Visualization Libraries to Handle Stream of Data

AWS Batch Jobs — An Overview

AWS Batch enables you to run batch computing workloads on the AWS Cloud. This service can automatically provision compute resources and optimizes the workload distribution based on the quantity and scale of the workloads. Related Definitions Jobs: A unit of work (such as a shell script, a Linux executable, or a Docker container image) that you submit to AWS Batch. It runs as a containerized application on an Amazon EC2 instance in your computing environment, using parameters that you specify in a job definition. Container images are stored in and pulled from container registries. Job Definitions: specifies how jobs are … Continue reading AWS Batch Jobs — An Overview

How to Fact-Check using Natural Language Processing Techniques? A Literature Review

In this article, we present the summary of our research in the field of fact-checking. We categorized them in two categories, first are the closed source published applications and the second are the research projects done in this field. Closed Source Snobs Their methodology depends on human annotators to fact check a piece of the news and present a detailed report regarding the inaccuracies in the article Reporters’ Lab Their methodology depends on human annotators as well, and dataset can be found in https://www.politifact.com/texas/ and https://factnameh.com/ Fullfact Their methodology builds a fully automated fact checker, but no details are provided … Continue reading How to Fact-Check using Natural Language Processing Techniques? A Literature Review