How to make Chatbots Intelligent using Natural Language Processing (NLP)
Adoption of chatbots is growing at an extremely fast pace across verticals. The success of chatbots is considered through a great conversation or interaction with the end user, or conversion of prospect to lead. This can only happen if chatbot intelligently respond with right details to engage end users.
It is important to understand that developing ‘mind’ of Chabot’s is not a quick task. The human mind has developed a memory (data), and logical thinking over the years. Similarly chatbots require data and algorithms / library to run on pre-defined data, and keep on learning as and when something new arrived in the dataset, i.e. Train data.
Maintaining and understanding of context of interaction / conversation is extremely important. This helps better engagement with user by responding relevant responses, and do not ask for information which is already provided, e.g.
User: I am looking for a flight to Singapore
Bot: Sure sir, Please confirm your departure city
User: I will be travelling from Delhi
Bot: Noted, confirm your travel date
User: On next Saturday from Mumbai
Bot: MyJet Flight No. MJ 404 from Mumbai to Delhi will cost you Rs 4500. Please confirm for booking
User: Thanks. I will get back to you shortly.
In this blog, we will discuss thought and research on a path to build an intelligent chatbots through NLP / ML, basis experience / research gained during implementation of Surbo, Chatbots for business platform.
Here are the steps & concepts required to follow:
Pre-processing includes the removal of words from the end user query, those are not required and if used, will ultimately distort the extraction of keywords that are present in the context of the user query.
Pre-processing steps involves:
- Text Cleaning i.e. Throw away unwanted stuff e.g. smiley / emoji’s.
- Stemming (Lemmatization): English words like ‘look’ can be inflected with a morphological suffix to produce ‘looks, looking, looked’. They share the same stem ‘look’. Often (but not always) it is beneficial to map all inflected forms into the stem.
- Stopword removal: the most frequent words often do not carry much meaning. Examples: “the, a, of, for, in,” so these words need to be removed to get better accuracy.
Entity is a concept or information required to be identified by program on user query. Entities could be – companies, people, place, products, dates etc. or in simple words entity is often a noun or noun phrases. NLP / ML adds intelligence by automatically detecting entity from a user query. For e.g.
User: Offers on hotels in Gurgaon
Result: location = Gurgaon
Two ways to implement:
- Stanford NER tagger e.g. Stanford NER
- Custom implementation of entity for enhanced data and flexibility
Phrase / Keyword Extraction
Keyword extraction or sometimes called as phrase extraction is used to extract the important keywords / phrases from the sentences which defines the context of the sentence or used to extract the keywords which defines relation between the two extracted keywords / phrases.
Here is the quick overview of keyword extraction algorithms:
|A. KMP (Knuth Morris Pratt) algorithm||KMP (Knuth Morris Pratt) algorithm is famous for searching a particular pattern in a target string. It is one of the best patterns searching algorithm with a worst case complexity of
m is the length of the target string
This basically searches the exact pattern as mentioned in the entity; any miss match would fail the case. For e.g.
The entity has keyword: Indira Gandhi international airport
Here KMP would fail to search
|B. Tries||Tries is a data structure for retrieval of information from user query in a very efficient way. Being a tree like data structure it gives us a search complexity of
This is useful in cases where the user enters a partial query and doesn’t input the whole keyword. But this data structure comes with a trade-off of space complexity.
|C. Collocation||A collocation is a combination of words that are commonly used together and the task of extracting collocations is collocation extraction.
The biggest drawback of this approach is that it requires a huge set of meaningful data and must contain the phrases which a user query might contain. This is a cumbersome task and sometimes requires humans to manually input the keywords which occur more commonly together rather alone.
Classification / Text Classification
Text classification can be defined as a technique to classify texts in different predefined categories based on the semantic closeness of text with the classifying category. Text classification is used for various tasks such as topic classification, sentiment classification and more other tasks.
There are various classifying algorithms which train data and based on the training data the new data is classified into appropriate category. This intelligence will greatly help a chatbot to classify the user query to their appropriate domain without the intervention of a human. To get this thing done, we will move towards machine learning in the form of Natural Language Processing.
Our aim is to classify phrases into one or more entities which user wants to capture. Classifying phrase is basically about assigning most relevant entity to the particular phrase.
Machine learning is teaching a machine to automatically learn from the inputted data or itself infer the logic from the inputted data. There are two classes of ML techniques:
- Supervised learning methods
- Unsupervised learning methods
In supervised learning, a model is created based on a training set. Model means a set of rules inferred from inputted data.
Text classification using vector model:
In text classification first sentence is converted into a computer understandable format which can be thought of as a vector (array) of 0 and 1 with each index representing a word in the training data.
Let’s suppose our training data consist of three sentences or in ML language these can be thought of as three separate documents and from these documents are extracted important features which are used to build up the vectors. There are various methods to extract these features and they differ from use cases, but the most commonly used one is TF-IDF.
TF-IDF method (Term Frequency – Inverse Document Frequency)
The TF-IDF weighting for a word increases with the number of times the word appears in the document but decreases based on how frequently the word appears in the document set. NOTE- here the document is a one line sentence but in actual implementation there can be 100 of lines. After extracting features the next task is to create a feature vector or one hot vector for each training sentence. Here are three sample sentences:
sen1: I want to travel to New York – (category: travel)
sen2: The movie was a sci-fi one. – (category: movie)
sen3: The cricket match was awesome today. – (category: sports)
Let’s see how these vectors are created:
The feature words in our case are: (want, travel, new, york, movie, sci-fi, cricket match, awesome, today)
We will mark one (1) if that word exists and mark (0) if doesn’t exists:
So each sentence is now converted to a vector and this will act as a model in our case.
Now when the user query comes, using these feature keywords, vector of user queries is also created.
E.g. I want to travel to New Jersey today
Now matching this with the vectors of the three categories mentioned above, it can be clearly seen that the user query can be classified into travel domain. This was a common, very basic example of implementing NLP using ML. That’s how classification happens, and model or data is trained.
Hope you are now clear with steps of building intelligence in chatbots. In the next blog, we will share more specific use cases, code, algorithms, challenges and unsupervised learning methods including topic modelling.