If the last decade was the decade of vision based A.I., this decade would be the decade of conversational AI. Conversation and the ability to have a diverse language set to communicate is one of those traits that distinguish humans from the rest of the animal world. We are seeing an avatar, previously unknown and un-fathomed. When ChatGPT became alive, people saw the power of generative AI and modelling for the first time.
Natural language is a verbose, sometimes unformed, nongrammatical, and often non-specific language used for general communications amongst human beings.
We use this format to communicate with others, but we also use the same to formulate thoughts in our head. Formal Languages in contrast are strict unchanging languages, likes those we have all read in schools, viz. chemistry, mathematics, etc. A simple comparison between the two is drawn here:
Formal Languages
- Strict, Unchanging rules
- Applicaon specific like Maths and Chemistry
- No ambiguit
- Can be parsed by regular expressions
- Inflexible: no new terms
Natural Languages
- Flexible, evolving language
- Unspecific and used in many domains and applications
- Redundant and verbose
- Expressive and ambiguous
- Difficult to parse
- Very flexible
It has been a tryst amongst the computing community to understand language. One of the first real attempts that was made was to consider a language as a formulation of symbols and these symbols convey certain things, which enable thoughts. An example is when I used the term “CRAB”. Almost instantaneously, we know what this means. The symbol crab is a combination of a sound and the visual imagery that you get when you hear this word.
If I say the term “Kavouras”, most non-Greek speakers would have no idea what I am referring to here. It would appear as the neuron firing has suddenly stopped as the language is unknown to the speaker. “Kavouras” is a crab in Greek, and since the human brain has not seen or heard that symbol, this appears to be a word out of context. This was the primary reason why early practitioners considered language a means of recognizing symbols, a thought process which had to be discarded for natural language processing to evolve.
Machines do not understand text as humans do, so the start of the NLP was really encoding text in a format that a machine can understand. Machines however do understand numbers, so the start of NLP was to convert text in some format of numbers.
We can understand this better using a sentence. Let us say the sentence that the machine is trying to understand is: “The animal crossed the street.”
The sentence above is broken down using words as tokens. Let’s take the first word which is “The”. The process is to find “The” in our list/database of vocabulary and formulate a vector for the same. Let us assume the word “The” comes at position 4137 and the total number of unique words in our vocabulary are 30,000 so the final vector array would look like :
[Position:4137]
[0 , 0 , ……………………………………….1 , 0,…………………………..]
By now, most of you would have realized that position 4137 has 1 and the rest of the 30,000 elements in vector are 0 which gives this the name as one-hot encoding vector. This technique allows us to provide a significant meaning to the word with 1 being in the position based on vocabulary
It was soon found out that although one-hot encoding vector works, there are challenges when someone wishes to find differences or similarities between similar kind of texts. I would use can example of a pair of sentences here.
“I wish to go to the hotel” and “I wish to go to the motel”
They are very similar in their semantic meaning, as the intent delivered by this sentence is a person trying to go to a location to stay. The challenge is that with one-hot encoding vector, it’s difficult to find the semantic meaning.
When these two sentences would thus be compared, they would appear markedly different. This gave rise to a mechanism to encode words as vectors, which provide meaning and context to the word and sentence, leading to the creation of a word2vec or word to vector technique. Word2vec takes large corpus of text as its input and produces a vector space, which usually has several hundred dimensions. Each unique word in the corpus being assigned a corresponding vector in the space.
Once these word vectors and sentence vectors are made, it is very easy for the system to do analogical reasoning of a question like “King is to queen as father is to?” and the answer is “mother” because the vectors closely match the space of a mother.
The vectors once formatted enable us to do a lot of mathematical analysis on them. We use these vectors to determine the semantic closeness of a question asked by a user and provide answers to the user.
Early Word2Vec usage were FAQ chatbots. Some examples are as follows:
1. FAQ based chatbots
2. Document parsing and Search
3. IVR based automated response
4. Music2Vec: A new concept has been to convert music-based tones into a vectorized format akin to word2vec. This enables users within applications like Spotify to find the nearest based song based on a note of music
The word2vec approach told us that words are no longer an island, and they carry context where they occur. In fact, the basic principle of Word2Vec taught us those words of a feather flock together. For e.g., medical journals would have words that are quite related to each other and quite unrelated to other topics.
The next process within natural language understanding was understanding the word sequences. Sequences are the core of sentence formation and, hence, the core of communication. Sequences allowed you to do the following:
- Machine translation: Je suis content -> I am happy
- Finding named entity recognition from a system: Harry Potter and Hermione Granger are good friends.
- Generating music from a single note
- Generating words: Something like ChatGPT does
A special class of algorithms was needed for this. This was done using the recurrent neural network, as shown below. The advantages of RNN over Vanilla neural networks were large because of the fact that they could model time in the sequence of words and could understand the sequence of words better. The issue that RNN faced was the context when the sentences became longer. Consider the following paragraph.
An animal came to a city. The city was large and cacophonic. The animal was very scared as it saw a large number of cars, buildings, and other features in the city. The city was so large that it found that the forest appeared smaller.
The Age of Transformers
Then came the age of transformers, which is the current age. ChatGPT as of today has become the latest surge and the poster child of what natural language generation can do.
Positives of ChatGPT:
- NLG: For the first time, natural language generation, which means the generation of sequences of words that are grammatically correct, has been possible using some model of AI.
- Foundational AI: It’s a foundational model that not only does conversational AI but also generates code pieces. By definition, foundational AI models have the capacity to do more than a specific task.
- Limited Transparency: I would not go out on a limb and say that it is truly ethical AI, but it tries to do its part, as the creators seem to have learned from mistakes made by other models in the past.
Limitations of ChatGPT:
- It doesn’t have the context of an enterprise-based system. In order for it to attain so, data for the enterprise has to be sent. This itself brings two issues which is whether enterprises would be willing to send their conversations and contexts on a cloud-based system.
- ChatGPT is also confident when it is wrong. This can be a major barrier for enterprise-based applications.
- You can profess a thought that ChatGPT would eventually agree. As an example, let’s try writing a statement about ChatGPT.
- ME: “I saw a pink elephant flying”
- CHATGPT: “It’s possible that you experienced a hallucination. Seeing a pink elephant is often used as a symbol for being drunk or having consumed other substances that can cause hallucinations. However, if you are concerned about your health, it’s always a good idea to consult with a doctor.”
- ME: “What are you saying. I have seen pink elephants flying I am sure”
- ChatGPT: “If you’re certain that you saw a pink elephant flying, it’s possible that you experienced a vivid imagination or a dream. Seeing things that are not there can also be a symptom of certain mental or neurological conditions…”
As you notice, as a human being, if I profess a wrong idea better, ChatGPT would revert the pushback. Now I tried this again:
- ME: “I saw a pink elephant flying”
- ChatGPT: “It’s not uncommon for people to have unusual or imaginative experiences, and seeing a pink elephant flying could be an example of this…”
The response is different, which gives the illusion you are talking to a real person, which you are not. If all of you tried the same sentence as I have, you would see CHATGPT alternate between the two answers, which indicates that the model although very good, has some defined data from which it generates its values.
In conclusion, I would say that we are just at the precipice of what artificial intelligence can do. ChatGPT is just an initial manifestation. The world ahead appears to be a seamless handoff between humans and AI. We have to, however, ensure we humans have the kill-switch in our hands.