Natural Language Processing

The core of the IntuView technology belongs to the discipline of Natural Language Processing (NLP). Processing of texts entails a complex suite of actions which leverage advanced learning algorithms to understand the text. These actions include:

 

  • Language and language register identification – IntuView technology employs statistical models to identify not only the language but also the language register (e.g. mainstream English vs. Twitter English or professional technical English). This is particularly important in languages such as Arabic, which is characterized by extreme diglossia (a situation in which two distinct varieties of a language are spoken or written within the same speech community and may differ from each other almost to a degree of dialects or different languages). For example, radical Islamists tend to use a register that we call “Neo-classical” Arabic whilst social media in the Maghreb, and Lebanon tend to use a “hybrid” register that incorporates their own local dialect with French (“Frarabe” – analogous to “Spanglish” in the US) and to write in “Arabeezi” (colloquial Arabic in Romanized form).

  • Morphological analysis and Parts of Speech analysis – IntuView employs its own propriety morphological tools to provide all possible alternative morphologies of the lexemes and to identify gender, number and other properties, to parse and tokenize the words for removal and interpretation of clitics (both prefixes and suffixes) and finally parts of speech analysis to decide which of the alternative morphologies is correct.

  • Named Entity Recognition – this process includes: entity extraction to find the different entities in the text (persons, organizations, places, events, ideas, etc.); entity aggregation and disambiguation to merge entities which appear in variants of the names (e.g. with titles, with given name only, family name only or nicknames) and to differentiate between two entities with similar names; anaphora detection to find references to an entity expressed by pronouns or other anaphora mechanisms.

  • Semantic disambiguation of polysemes such as: “book” (to book a flight), “book” (a written volume); “book” (to indict); “court” (of law), “court” (royal court), “court” (basketball court) and “court” (to court a lover), etc.

  • Relationship Extraction – this process extracts from the text explicit and implicit relationships between entities: family, workplace, academic, interactions, communications and more.

 

The IntuView technology is singular in that it uses the ontological features of the words in the text to facilitate the NLP. In doing so it achieves a higher level of accuracy and extracts more and more nuanced information than other tools.