Yes, understandably you might be thinking is this related to Rick and Morty? Well unfortunately No. But you should really continue reading cause Multidimensional topic modeling is really cool. In this short piece we will explore the fundamental idea behind … Continue reading Multidimensional Topic Modelling. The What? and The How?
In one of our previous articles, we discussed the idea of multi-dimensional topic modelling, and no it is not related to Star Wars, so if you thought it is, go here and give it a good read. Back from Alderaan. … Continue reading Viewpoint, Topic and Opinion Discovery in an Opinionated Document
One of the services we provide at Almeta is estimating the political bias of a piece of the news in other pieces we have discussed the technical details of this feature but in this piece, we will go through the … Continue reading How to Visualize a Political Bias Data Metric
The rise of political bias problem across several news anchors presents a real threat to free and independent journalism and a major factor in shifting the populace conception of the world. Several NGO’s, research centres and private organizations are working … Continue reading From Sentiment to Political Bias in the Arab World and the Arabic Content
First, a motivational example: Many products on the internet allow the user to leave some feedback. This feedback is usually reviewed manually to figure out what are the users likes or dislikes in the product, what are the features they … Continue reading Aspect-level Vs Entity-level Sentiment Analysis
This article is a part of our series on political bias detection we will hopefully introduce you to the various aspects of our political bias detection system, and you can learn about: How can we predict the political orientation behind … Continue reading Stance Detection – State of the Art
If you don’t know what is stance detection make sure to check our article on it. Are we on the same page? Cool let’s go. First a motivational example: Many products on the internet allow the user to leave some … Continue reading Subjective Stance Detection What is it? and How to do it?
While some news anchors try to stay professional and subjective in all of their articles, most of the news we consume are published to push a specific agenda especially when it comes to politics. In our effort to battle news … Continue reading Political Orientation Detection – AI and NLP Approach
This is the first article from our series on political bias detection we will hopefully introduce you to the various aspects of our political bias detection system, and you can learn about: How can we predict the political orientation behind … Continue reading What Constitutes a “Bad” News Article?
In this task the goal is to assign a given piece of text a tag (or number) representing the level of informativeness or detail this text holds usually by training a model to do that. Here, we rely on the … Continue reading Automatically Tagging Data for Content Informativity Scoring
Let’s start with a simple question, what constitutes an informative article? based on Oxford’s dictionary. informative/ɪnˈfɔːmətɪv/ adjective: informativeproviding useful or interesting information However, this is still an abstract concept. Yes, it is much simpler to flag an article as spammy … Continue reading How to Rank Articles Based on How Informative They Are – Using Snorkel
In a previous article (see next paragraph) we explored how to approximate an article informativeness in a supervised fashion, such a method would require training data, in this article we will explore on way to get this data, one very … Continue reading Can you measure a text Informativeness using its summary?
Let’s start with a question, given 2 articles A and B that talks about the exact same thing, what makes one of them more informative than another? Is it the ease of reading? the amount of details? or is it … Continue reading Supervised Article Informativeness Prediction – The What and the How
In our effort at Almeta to provide the articles with the highest informative value to the Arabic readers, we have employed several methods to measure the informativeness of a piece of news, in this article we will shed light to … Continue reading Term Informativeness Estimation in the Arabic Language
In a previous article, we talked about the various factor that makes an article more informative, using cliches was not one of them, this article is a part of our research on measuring text informativeness, if you are interested jump … Continue reading How to Detect Cliches in Text
Let’s start with a simple question, what constitutes an informative article? based on Oxford’s dictionary. informative/ɪnˈfɔːmətɪv/ adjective: informativeproviding useful or interesting information However, this is still an abstract concept. The question of measuring How informative a piece of news is … Continue reading Informativity Detection – Almeta’s Research Gist
The Concept of an informative text is really abstract and it is hard to come up with a definitive formula to measure it, in this article we will explore some of the features that we believe can make an article … Continue reading What Makes an Article Informative – And How Computers Can Measure Informativity of a Text Content
In our effort to provide the best news feed out there, one of the goals we are trying to achieve here at Almeta is to capture the interaction between different news outlets and how the coverage of the same event … Continue reading Aspect Detection and Named Entity Linking (NEL): Using SPARQL and DBpedia
Events possess a rich structure that is important for intelligent information access systems (information retrieval, question answering, summarization, etc.). Without information about what happened, where, and to whom, temporal information about an event may not be very useful. In light … Continue reading An Overview of The Event Extraction Task in NLP
Search queries, passport scans, barcode scans, your online shopping history, your photos on Instagram, your tweets on twitter, voice messages, every day news articles, and more, and more… All of these contain a huge amount of data… Data generation is … Continue reading Sequential Clustering
Search engines use indexing to store information about web pages, enabling them to quickly return relevant, high-quality results.Indexing is the process by which search engines organize information before a search to enable super-fast responses to queries. Searching through individual pages … Continue reading Search Service Frameworks Evaluation
Text-to-speech (TTS) is a type of assistive technology that reads digital text aloud. It’s sometimes called “read aloud” technology. text-to-speech applications are offering an innovative solution for users to interact with content by taking it out of books and computer screens and … Continue reading Comparison of Available TTS Services
In this post, we’re analyzing the results returned by the readability metric in our news feed. If you haven’t checked our post about “How to measure the readability of a text?” before, you can read about it here. How Are We Measuring the Readability? The main part of analyzing a metric is to know how does it work. In the current version, we’re depending on the AARIBase metric for measuring the readability. So, let’s have a look first on how does AARIBase work. Here’s the AARIBase formula: AARIBase = (3.28 × NOC) + (1.43 × ACW) + (1.24 × AWS) … Continue reading Analysis of the Readability Metric Results in Almeta News Feed
Readability is the ease with which a reader can understand a written text, which accordingly indicates how effectively the text will reach the target audience. The readability of text depends on its content (the complexity of its vocabulary and syntax), … Continue reading How to Measure Text Readability?
In a previous article, How to Detect Clickbait Headlines using NLP? We introduced the task of clickbait detection and explored how it can be modeled within the domain of machine learning and NLP. If you are not familiar with the concept of clickbait detection, make sure to review it before continuing. In this post, we’re building a classifier for clickbait detection in the news headlines depending on a pre-trained Arabic Word2Vec model and we’re validating this solution. If you are not familiar with the Word2Vec concept you can refer to this Wikipedia article for more information. News Headlines Representation In … Continue reading Clickbait Detection Using Word2Vec Representation
Clickbait is a type of hyperlink on a web page that has catchy or provocative headlines difficult for most users to resist, they tell you exactly what you’re about to see, with just enough of a tease at the end … Continue reading How to Detect Clickbait Headlines using NLP?
In this post, we are exploring how Google’s AutoML can help us in Almeta in developing automatic Arabic language processing tools. Before start if you are not familiar with the term AutoML you can refer to our previous post on this topic. Who is Google AutoML for? and When to Use It? The targeted audience by Google’s cloud autoML are people who have limited knowledge in machine learning. The main goal of this cloud service is to let the user build his own AI model that is tailored to his business needs, if the provided services by Google’s AI API … Continue reading Google’s AutoML Overview
When applying machine learning models, we’d usually do data pre-processing, feature engineering, feature extraction and, feature selection. After this, we’d select the best algorithm and tune our parameters in order to obtain the best results. AutoML is a series of … Continue reading Automated Machine Learning (AutoML)
In the process of designing a good system, you must have heard of the term Microservices, where, in short, each microservice is responsible for a specific task or a group of heavily related tasks, and they communicate with each other … Continue reading Microservices & B2B Authentication – With AWS and Serverless (sls)
While re-designing our backend and moving to better architecture, we got to the point of thinking of our database and moving it from EC2 instance to some fully-managed databases, going Serverless all the way. The fact that our backend will … Continue reading AWS Databases that works with Serverless Architecture
“Something that is untested, is broken.“ This is our first article on Software deployment in Python about Testing Frameworks. Hope you enjoy this series and make sure to check out our other articles git-submodulo and how they work in Python … Continue reading Testing Frameworks for Python
What is git-submodules? The basic principle that makes many professional tech companies professional is the simple principle of domain engineering. Basically working for a long period of time on a small set of domains with the hope that you will … Continue reading Git Submodules in the Python World
Before we talk about Dependency Management, Python is Awesome, it simply is, but in my opinion what truly makes Python a great language is not the Syntax structure or the dynamic nature or any other of these features, but rather … Continue reading Packaging and Dependency Management in Python
The CI/CD pipeline is one of the best practices for DevOps teams to implement, for delivering code changes more frequently and reliably. This is our third article on Software Deployment we advise you to check the first 2 articles in … Continue reading Continuous Integration & Continuous Delivery: CI/CD
What is Docker Images? Docker is a platform for developers and system admins to develop, deploy, and run applications with containers. A Docker image we are going to talk about contains everything needed to run an application as a container. … Continue reading Docker Images
Software deployment is the final stage of every software project. When all the hard work you have put in over the course of time goes live to be used by the target audience. It includes all the process required for … Continue reading Software Deployment
قد لا يكون لديك الاطلاع الكافي على معالجة اللغات الطبيعية لكنك بالطبع تعرف كل من سيري أو أليكسا! “لم أفهم ما قلته للتو.” هذا ما يمكن أن تجيبك به سيري أو أليكسا مراراً وتكراراً. متى كانت آخر مرة طلبت فيها … Continue reading ما هي تقنيات معالجة اللغات الطبيعية
قبل أن نبدأ بالحديث عن درجة إفادة وغنى النص دعونا نبدأ بسؤال بسيط، ما الذي يشكل مقالة مفيدة؟ صفة الافادة هي توفير معلومات مفيدة أو مثيرة للاهتمام. ومع ذلك، هذا لا يزال المفهوم مجرداً. مسألة تحديد درجة إفادة وغنى النص … Continue reading كيف نحدد درجة إفادة وغنى النص
هل شاهدت سابقاً أحد أفلام الخيال العلمي التي تعرض شيئاً عن الذكاء الصنعي والتعلم الآلي وقلت لنفسك من المستحيل أن يحدث هذا؟ لابد أنك قد شاهدت في أحد هذه الأفلام آلات تتحدث، وآلات تفكر. على مر السنوات الماضية، لابد أنك … Continue reading ما هو الذكاء الصنعي والتعلم الآلي وما علاقتهما ببعضهما
١- يومياً يتم نشر آلاف المقالات الأخبارية من قبل العديد من وكالات الأنباء المختلفة. كقارئ قد تتلقى نفس الخبر من مصادر متعددة وللعديد من المرات ضمن التدفق الأخباري المستمر، لذا من المفيد أن يكون لدينا نظام ذكي لتحديد المقالات ذات … Continue reading النشرة الشهرية لجهود الميتا التقنية – شهر كانون الأول ٢٠١٩
Intro In Almeta you have to write a lot for those research tickets you have in a Sprint. You’ve to read tons of research, academic, and sometimes boring paper. But, when you write your proposal, you don’t have to write like them. As a matter of fact we want to be as close to non-techies as possible when writing our tech blogs. So, you’re an engineer and you love to code. You are a machine learning engineer and you love to read. You’re both and here comes a research/investigation ticket. You read, read, and read some more and now comes … Continue reading A Guideline for Writing Research/Tech Blogs
Intro We’re currently trying with different style. Between Agile/Scrum and Kanban. This is the latest we’re doing. We’re going to keep this post updated. The Team in Almeta We are a remote, cross functional team. We try to have balance in skill we have. We favor T shape employees. We <3 Valve. Skin in the Game: In a startup you’ve to eat your own food. And you’ve to take extra responsibility for any code you develop. We don’t have researchers and engineers. We have research-engineers. Those who learned to do research, develop ideas, write their code and also bring them … Continue reading Our Agile/Scrum Setup in Almeta
What is A/B Testing A/B testing (also known as split testing or bucket testing) is a method of comparing two versions of a webpage or app against each other to determine which one performs better. AB testing is essentially an … Continue reading A/B Testing
News stories are created every day at many news agencies. Users may receive news streams from multiple sources. Browsing in large-scale information spaces without guidance is not effective. Suppose, for example, a person who has returned from a long vacation … Continue reading Event Detection in Media using NLP and AI
Well, I hope we’re not like this. Here in Almeta, we’re always trying to hire the best people. We want them to be happy working with us. We celebrate honesty, directness and openness. This post is our push to make … Continue reading Our Open Contracts
قبل أن نتحدث عن مشاكل معالجة اللغات الطبيعية دعونا نبدأ بمثال معروف للجميع. متى كانت آخر مرة طلبت فيها من سيري أو أليكسا أن تفعل شيئًا ولم تفهما ما تقوله؟ أو أجابتا بشيء لا علاقة له على الإطلاق بسؤالك؟ سيري … Continue reading أكبر أربع مشاكل مفتوحة في معالجة اللغات الطبيعية
قبل أن نبدأ بالحديث عن معالجة اللغة العربية ، ذكرنا في المدونات السابقة أهمية معالجة اللغات الطبيعية ومجموعة التطبيقات الواسعة التي يتم فيها استخدام معالجة اللغات الطبيعية. نظرًا لأن الهدف من معالجة اللغات الطبيعية (NLP) هو تسهيل وتبسيط التواصل بين … Continue reading أكبر التحديات في معالجة اللغة العربية
“إن معالجة اللغات الطبيعية و التعلم الآلي هما الأساس لأي نظام من الذكاء الصنعي، حيث تكمن أهميتهم في القدرة على التواصل معنا بطريقة إنسانية وأتمتة عملية التعلم، بغض النظر عماتريد الوصول اليه سواءً كان تحليلات تنبؤية أو إرشادية، تنبؤ، تحسين … Continue reading الأفكار الثلاث الأكثر إثارة في معالجة اللغات الطبيعية (NLP)
قصتنا وفريقنا نحن ثلاث أصدقاء منذ السنة الأولى في الجامعة وتقنيين قمنا بتأسيس شركة الميتا لأننا نحب لغتنا العربية. وقد نما فريقنا ليتضمن مجموعة من مهندسين الذكاء الصنعي الذين عقدوا العزم على تطوير مجال معالجة اللغات الطبيعية للغة العربية. لماذا … Continue reading ماهي الميتا؟
“Machine learning and natural language are the foundation to any AI system, just in the ability to communicate with us in a human way and to automate that learning process, what you build on top of that, whether it’s predictive, … Continue reading Top 3 Exciting Ideas in NLP in 2018
We have mentioned in previous blogs the significance of NLP and the wide range of applications where NLP is used. As the basic goal of NLP is to ease and simplify the communication between machines and humans, it is highly … Continue reading Biggest Challenges in Arabic Natural Language Processing
Our story and team Founding Almeta.io, we’re 3 friends (since our first college year) and techies who love their Arabic Language. Growing with a bigger team: Our team now consists of self-motivated machine learning engineers who are determined to help … Continue reading Why Almeta.io?
When was the last time you asked your Siri or Alexa to do something and they did not understand what you are saying? or they answered with something totally not related? Siri and Alexa are speech bots that rely basically … Continue reading 4 Biggest Open Problems in NLP