4 Biggest Open Problems in NLP

When was the last time you asked your Siri or Alexa to do something and they did not understand what you are saying? or they answered with something totally not related? Siri and Alexa are speech bots that rely basically on an artificial intelligence technology called NLP. If you want to find out more about NLP and what it can and can’t do continue reading this article.

NLP stands for Natural language processing which is defined as a branch of computer science and artificial intelligence concerned with assisting the computers to understand the human natural languages by analyzing huge amounts of natural language data. The NLP problems ranges from simple problems such as answering an enquiry on the web to very complex problems that requires terabytes of data for training, but How much can NLP really understand what humans says? And How long will it take until we have a normal conversation with a computer? In this article we are going to discuss 4 of the most challenging NLP problems:

1. Natural language ambiguity

In natural language, a word can have different meanings and the meaning of the word can be extracted from the context. For example, the sentence “A piece of cake” might mean that we are talking about a small portion of a birthday cake, on the other hand, it might mean that something is very easy to do. The humans don’t only use their knowledge of a language to decide the meaning of a piece of text but also consider several other factors such as desires, goals and beliefs to understand the text they are reading or listening to. For example, the sentence “I experienced a feeling I have never had before” might mean that the person experienced a very pleasant feeling or a very bad one and the meaning of this sentence depends on the personal emotions at that moment.

2. The lack of training data

One of the biggest challenges in NLP is the shortage of training data as each NLP model need to be trained on terabytes of data in order to be able to understand a specific language, model training is a complex topic which will be covered in another separate article.  The lack of training data has several reasons: the first reason is that the language is a minority language which means that it is spoken by a minority of population such as Kurdish and Afrikaan. The second reason is the small amount of resources and text available on the web for example the Zulu language. Another reason for the lack of training data is missing the incentive to work on low resources languages either due to not available skills or the difficulty of the language as the case in Arabic language.

3. Spelling mistakes and entity extraction

Correcting misspelt words is an essential process in NLP as Misspellings are very frequent in human-computer interactions and it would be very hard to identify a misspelt entity (the noun in the phrase) in a text. For example: if a user wrote on a chatbot “Is it going to rain today in amestedam?”, it would be hard to identify Amsterdam as a location.

4. Semantic meanings extraction (this can be part of ambiguity)

The computer should not only understand the vocabulary of the text but it should also understand the semantic of the text. For example: in the sentence “John called his wife, and so did Sam” we don’t know if Sam called john’s wife of his own.

Do you know that we use all this and other AI technologies in our app? Look at what you’re reading now applied in action. Try our Almeta News app. You can download it from Google Play or Apple’s App Store.


1- What are some of the challenges we face in NLP today?

2- The 4 Biggest Open Problems in NLP

3- Six challenges in NLP and NLU – and how boost.ai solves them

Leave a Reply

Your email address will not be published. Required fields are marked *