We have mentioned in previous blogs the significance of NLP and the wide range of applications where NLP is used. As the basic goal of NLP is to ease and simplify the communication between machines and humans, it is highly crucial to see how it will impact the lives of the people who speak, communicate and work with the 6th most spoken language in the world, the Arabic language. Arabic is a Semitic language that is spoken by approximately 420 million people in the world, in addition to that, Arabic is an official language in 26 countries and it is one of the 6th official languages of the United Nations. Arabic is morphologically rich and has many varieties, for example, there is the classical form of Arabic which is the language of the Quran (the Muslims holy book) and this is considered to be the most perfect form of Arabic, another variety is the modern standard Arabic which is the official language today and used in literature, education, books, media and other formal locations and situations and finally there are the Arabic dialects that are the everyday speech and
1. Arabic orthography
The Arabic language alphabet consists of 28 letters, only three are long vowels (ا) pronounced (Alef), (و) pronounced as (Waw) and (ي) pronounced as (Ya’a). In addition to other nine vowels represented as characters (َ ُُ ِِ ً ٌ ٍ ّ ْ ). Arabic is also one of the languages where the shape of the letter can change according to how it is connected with the other letters. For example the letter (ت) (the letter ‘T’ in English) has three forms of writing: it is written as (ت) if it is located at the end of the word, ( ) if it is located at the middle of the word and ( ) if it is located at the beginning of the word. Arabic orthography is very important to consider in all NLP tasks and applications such
2. Arabic morphology
All the verbs in Arabic have a root from three or four letters which make Arabic a highly derivational language. Usually there is a template for Verbs derivation we can write that as verb=Root+pattern. The following table shows some examples of verbs in their past, present/future and commanding form derived from three and four letters roots.
|كتب||ي||ي+كتب=يكتب||yaktb||Future/present form from write|
|كتب||ا||ا+كتب=اكتب||Ektb||commanding form from write|
It is also very common in arabic to attach prefixes and suffixes to verbs and we can formulate that with the following equation New_Verb=Prefix(es)+Verb+Suffix(es). The following table shows an example of inflection in Arabic.
|يكتب||س + يكتب = سيكتب||He will write|
|يكتب||س + يكتب + ه = سيكتبه||He will write it|
Studying the Arabic language morphology is very important for NLP tasks such as morphological analysis and POS tagging.
3. Complex syntax