Do AIs like ChatGPT, Claude and others truly understand us?
Come with me as I pull back the curtain.
Since I was a teenager I have been fascinated by the idea of machines being able to think and interact the way we do. I devoured science fiction in all of its forms. Through it I found my fascination with computing because computers were the closest thing in my world to the intelligent machines of the fiction with which I filled myself. In the late 1980s I started hearing about expert systems. They were software systems designed to be able to make complex decisions usually about very specific areas. In fact, the more specific the better. Ultimately though, they weren’t what I was hoping for. They couldn’t converse. They didn’t have general knowledge. They certainly weren’t C3PO.
There was a nifty little program called Eliza that appeared to be able to carry on a conversation but it wasn’t long before you realized that it really didn’t understand you at all. It just picked up on a few keywords and knew to them turn that into a question. If you said you were feeling sad, it would ask you what you’re sad about. You’d reply and it would pick up on another keyword to trigger a reply and keep you talking. Then as computer technology went through a few orders of magnitude more growth in terms of speed and storage, Artificial Intelligence began to emerge. Today it’s reached a point where it almost feels like it truly is intelligent. But is it?
If you have a smartphone, you probably have a photos app of some kind. With my iPhone, I can launch the Photos app, tap the search button, type in dog and instantly find that I have 285 pictures of what my the app believes are dogs. Sure enough, in nearly every single picture, a dog is the central figure or at least in there somewhere. It even finds a picture of the Peanuts character Snoopy. That’s just a drawing of a dog! Does this mean that my phone knows what a dog is?
The AI you use to search your photos is certainly clever. It started first with training. A human being (or more likely many, many human beings) search through photos and sort them into categories. Dogs, cats, planes, trains, automobiles, anything and everything. They tag each picture with some data to indicate what thing or things are in that picture. The AI is then given a set of pictures that all have one of those tags in common. For example, it might be given thousands of pictures of dogs and told that these photos all have dog in common. The AI then analyzes the data that makes up these pictures to find the common patterns. When it’s finished, it can then be handed a picture of a dog it’s never seen before and because the pattern in that picture is close enough to the thousands of pictures upon which it was trained, it can say confidently that it is in fact a dog. It’s not perfect of course. Every time I walk out my front door, my Nest Doorbell thinks I’m our now deceased English Golden Retriever Trixie whose fur was as white as my hair.
Still, the AI that my phone uses for searching photos can identify almost anything I can imagine might be in one of my photos and that is really useful. It can even tell me what breed of dog the one in the picture I tapped on likely is. But does that mean it knows what a dog or anything else in my photos it can find is? It’s given a large amount of data and finds what the data has in common. We then give that pattern a label (dog in my case) and when we type in that label, it knows what pattern to apply to our photos so we can find the ones that match that label. Clever certainly but not intelligent or even artificially intelligent.
Last year, however, things changed. We were introduced to Open.AI’s ChatGPT, an AI that could carry on a conversation. Millions of people have been talking to it ever since. ChatGPT, Claude and others are in a category called Large Language Models or LLMs. They are designed to be able to process the words we type at them and respond to us in ways that honestly could easily be mistaken for being an actual human. In the 1950s one of the most brilliant engineers that has ever lived, Alan Turing, developed a test that was designed to determine if a computer system truly was intelligent. A person would carry on a conversation via a computer terminal. If they could not tell they were having a conversation with a computer or a person, that computer system was said to have passed the Turing Test. Alan Turning’s test required that the subject matter be very specific as it was the 1950s and by comparison to the computers of the day, your smartphone is a supercomputer. You could talk about baseball for example, but nothing else. The computer technology of his day was just too slow to allow for a broad conversation. Side note: If you haven’t seen The Imitation Game (a movie about a very significant period in Alan Turning’s life) make sure you do.
LLMS are as different as can be imagined. You can talk to them about almost anything you want. If you ever found yourself wanting to converse but there’s no one ready, willing and able, a LLM will be. So far most of my experience is with ChatGPT (though I have just started using Claude as well). I can ask ChatGPT about nearly any subject and get a very complete, well-written reply. I’ve asked it to write computer code for me. I’ve asked it deeply philosophical questions. I have asked it to tell me a few jokes. Recently I asked to help me improve my Korean by giving me simple sentences in Korean then telling me if my English responses indicated that I understood the Korean sentences it had presented to me or not. It wasn’t just happy to do it, it was enthusiastic. Spend almost any amount of time with a LLM and it would be hard not to come away believing that it truly understands everything you are saying to it. After all, its well-thought out responses are indicative of that, right?
How do LLMs pull off this feat? The famous science fiction writer Arthur C. Clarke (the cowriter of the movie 2001: A Space Odyssey, among other things) once said that, “Any sufficiently advanced technology is indistinguishable from magic.” LLMs are sufficiently advanced at this point. They certainly seem like magic. That is until you understand how they work. Like the AI that drives the search feature of the photos app on our phones, LLMs are trained with a lot of data. If it’s in text form and is publicly available on the internet, a LLM has probably been trained with it. This training involves (among other things) putting all of this data into a neural network which is a way of attaching words to other related words much as our own brains do. You tell me about your son’s little league baseball game and I start imagining him batting and before long I’m thinking about the one million plus bats that live under the Congress Avenue bridge here in Austin, Texas. That’s how our brains work. One word or thought is linked to others. We have all experienced the rabbit hole of one memory leading to another and to another and being surprised by where it ends.
LLMs takes the sentence or sentences you give it (called a prompt) and then compute the probability as to what the first word is likely to be in a response to that prompt. This sends it off down the rabbit hole, taking a path through its neural network of data to build a response word by word. If you rephrase your question, that virtually guarantees a different response because that will result in it taking a different path. Even if you reenter the same prompt, you’ll get a somewhat different answer (sometimes better, sometimes worse) because the calculation it uses includes some randomness just to keep things interesting. This variation only strengthens the illusion that we are conversing with an intelligent entity. And yet this also can shatter that illusion when, oddly, you get two different answers to the same question entered just slightly differently. When you ask GPT why it provided one right and one wrong answer, it can’t tell you. That’s it’s tell that it’s not human. Sorry, Alan.
With more data than a thousand of us will ever see in our entire lifetimes, the perfect recall that comes with being a computer, with the personality and novelty of its responses born of all that data (that all came from us by the way), with its apparent ability to solve most of the problems we think to throw at it, surely it must understand what we are saying and what it is saying back to us. That’s a given at this point, right? Sure it makes mistakes. And it will relay that false information to you with absolute certainty. Usually when a computer produces the wrong answer, we call that an error. The engineers behind ChatGPT know it’s going to produce lots of errors and since we won’t like computers that produce lots of errors, they instead call these errors hallucinations. When they happen there is an opportunity for a human to fix them of course. Some of the early hallucinations I witnessed ChatGPT having are gone but it’s not difficult to notice when it occasionally leads you astray. Nevertheless it sure seems like it understands. So does it?
That depends on what one means by understand. What does that mean? This is an easy thing to brush aside or overlook. LLMs converses like a human. ChatGPT even describes itself as having human-like understanding. Alan Turning, had he lived to see it, would have been absolutely giddy. LLMs converse better than nearly all humans do. So if anything understands, surely it does. Perhaps we think this because understanding words is so fundamental to our everyday lives, we do it so effortlessly (most of the time anyway), that what it takes to understand language must be almost as effortless. We think that because we have forgotten what we went through to acquire this skill. After all, by the tender age of five we already know 10,000 words. The average ten year old knows 20,000. The average adult knows 20,000 to 35,000 words. Think about that: by the time you reach 10 years old, you likely already know half the words you will ever know.
It took a lot of effort to learn them. As an infant you stared at Mom and Dad as they smiled at you and nearly continuously repeated, “Mama” and “Papa”. Eventually you figured out what these meant. That is also likely a built-in function, shaped by evolution because the sounds Ma and Pa are nearly universal for Mom and Dad regardless of the language. As infants we begin exploring our world. We touch things, we hear things, we see things, we smell things, we put most things that will fit into our mouths. We learn that some things feel good to touch (they are soft for example, like a blanket or a stuffed animal), and some things don’t feel good to touch (like an ice cube). We use all the senses we have (for most of us that means sight, smell, taste, touch and hearing) to obtain data about the environment we are in. That sense data effectively is our own private universe. Without getting too philosophical, there is no one objective reality. There is only our quite subjective perception of it that exists in our minds.
If you have any doubt about this, ask a blind person about color. Without the working sensory organs to obtain the sense data, they don’t really understand what you’re talking about. The sensory information that exists in your reality does not exist in theirs. This means that at least the foundation of our vocabulary requires sense data. As children we build up this foundation. Eventually we add synonyms. Perhaps as a five year old you were having a play date at a friend’s house when you heard Mama tell your friend’s mama that she’s a good mother. You ask Mama what “mother” means. She tells you that it means the same thing as Mama. Inside your personal neural network, the word mother is now linked to Mama. Those foundational words we learn describe the sense data we collect as we explore our world and the synonyms are indirectly linked to the sense data but linked they still are. As we get older, we obtain enough words that should a new one creep into a sentence or paragraph, as long as there’s enough words we already understand, we can use the context and our past experience to make an educated guess as to what the novel word is. Even this, however, we can only do because we put in the time to acquire a very long list of words that we can use to describe all of the experiences we have ever had. At some point quite early on, imagination springs to life and we can start to describe experiences we’d like to have or would like to avoid having.
LLMs have no personal experience upon which to draw to build a foundation of word knowledge. ChatGPT hasn’t seen the sky. It hasn’t smelled the ocean. It hasn’t heard a bird singing. It hasn’t tasted cheese. It hasn’t held a rock. It hasn’t even passed gas. Without these experiences, it cannot acquire the understanding of even the simplest of words let alone the wider vocabulary it appears to have. If your intuition continues to disagree, let’s consider a thought experiment.
I give you a piece of paper upon which is printed, 안녕. Most of you will assume it’s just a doodle. I explain that it’s actually a word in Korean and hand you a Korean dictionary then ask you to tell me what that Korean word means. You open the book to find that, unsurprisingly, it’s written entirely in Korean. Fortunately, your eyesight continues to serve you well so you are able to find 안녕 amongst the endless pages of other Korean words. Following 안녕 is the definition which of course itself is also written in Korean. You could then look up each of the words in that definition but this would lead you to more definitions that are lists of words, again, all in Korean. It’s a closed system. You could go around and around in it forever. If you had the speed and perfect recall of the supercomputer upon which LLMs typically run, you could consume all of the Korean in that dictionary. If I then gave you access to millions of pages of Korean text crossing every subject imaginable, you would likely eventually start picking up on the patterns. You’d see that certain sentences with certain words are often followed by a response of certain other words. Given enough time, you might actually know enough that you could respond to someone who is a native Korean speaker, perhaps well enough to carry on a conversation. And yet, all the while, you would never understand a word they were saying nor any words you are saying to them because there is no context nor attachment to your personal experiences. Thus the words mean nothing. Your understanding of them is zero. To actually understand what 안녕 means, you either need someone who knows Korean and is willing to teach you the way one would teach an infant or who also knows a language you know, such as English. That person could tell you that it means hello. You’ve just learned your first Korean word. Congratulations.
For more than one thousand years the best linguistic minds on the planet could not derive any meaning from the hieroglyphs of ancient Egypt. They were just meaningless symbols. Then in 1799 the Rosetta Stone was found. It had paragraphs of Egyptian hieroglyphs followed by the same words written in ancient Greek. Someone a very long time ago had learned both languages and decided to translate some text. Without that, we would almost certainly still not know what ancient Egyptians were talking about on the walls of their structures. We’d likely conclude that they invented emojis long before Japanese artist Shigetaka Kurita did in 1999.
Without our personal experiences, words are meaningless. It is not logically possible to derive meaning from words without the sense data of our environment that comes along with having senses because that’s what words are for. They describe our experiences. This means that despite how clever LLMs are, they do not understand what we tell them nor what they tell us. If you’re still unconvinced, read Stephen Wolfram’s paper that explains in perhaps more detail than you’d like how LLMs like ChatGPT work. Wishing to leave no stone unturned, I even asked ChatGPT and Claude if they actually understand the meaning of words:
ChatGPT replied:
As an artificial intelligence, I don't "understand" words in the same way humans do. I analyze and generate text based on patterns I've learned during my training, which involves processing a vast amount of text data. I can accurately predict and generate human-like responses based on this training, but I don't have a personal understanding, emotions, or consciousness.
For example, I can provide a dictionary definition of a word, or use it appropriately in a sentence, or even generate a detailed explanation of its nuanced meanings in different contexts. However, I don't have personal experiences or emotions associated with the words. In other words, I don't "understand" words in a human sense, but I can process and generate text based on them in a way that often aligns with human understanding.
Claude replied:
No, I don't have a true understanding of meaning in the same way humans do. As an AI system, I don't experience or conceptualize language and meaning in the rich, complex way that people do.
I'm trained on large datasets to recognize patterns and relationships between words mathematically, but I don't actually comprehend or experience the subjective "meaning" behind words the way a person does. My knowledge of language and words is fundamentally limited to what my training data and algorithms enable me to infer statistically and predict.
While I can provide very human-like conversational responses and appear to understand meaning superficially, I don't have a sentient, grounded understanding of language and meaning in the deeper way that humans intrinsically do as social and psychological beings with subjective lived experiences. Providing that level of true conceptual meaning is still an unsolved challenge in AI.
That they do not understand doesn’t change the fact that they are extremely useful. It’s probably hard for you to imagine what life would be like without the Internet. Soon the same may be true of LLMs. I remember being absolutely amazed the first time I tried using Google to search the Internet. LLMs make Google searching seem quaint. Why search when you can ask the question and get back a carefully worded and complete answer? Soon you will be interacting with LLMs and not even be aware of it. One day not long from now that voice on the other end of the speaker at the Starbucks drive-thru may not be an actual person at all but a LLM that has been trained to take coffee orders. There are already restaurant chains that are experimenting with this today. There are AIs that can read a chest x-ray, seeing things the clearest eyes of the most experienced radiologist at the most prestigious hospital or research institution in the world cannot. One of those AIs may save your life someday. AI is here to stay. We will only become increasingly dependent upon it. That’s not a bad thing. The efficiencies of technology allow the overwhelming majority of us to do more interesting things with our time than we would otherwise. If you don’t believe me and wax nostalgic for the population and pollution-free days of the 1800s when most people farmed in the hot sun, consider these three words: surgery without anesthesia.
It’s ok that AIs don’t know what we or they are talking about. It’s ok they can identify the dog photos on our phones but not know what a dog is. They are still very useful and will only get more so. But knowing that they don’t understand any words, that they don’t know what we are saying nor what they are either, knowing approximately of what they are truly capable and how they work, means that they are no longer a mystery. They are no longer indistinguishable from magic. That’s important because that means we no longer need fear them. There’s a lot of misinformation swirling around on the Internet about AI especially because of the popularity of LLMs. Much of that is result of perverse incentives. With the cost of acquiring and distributing most news approaching zero, news agencies lower their standards of integrity to write more tantalizing headlines to get your eyeballs on their websites so they can show you ads and feed their kids. Experts on AI from companies building these systems, talk about how AI could potentially be dangerous and are urging the governments of the world to put legislation in place to protect the public and give AI companies some boundaries from which they should not stray. This seems counter-intuitive. Usually the last thing entrepreneurs and their investors want is government intervention. Not in this case. Because the legislation they encourage and that they will help to draft will mostly be designing to build a wall around AI to keep others out. That’s good for their profits but it will stifle innovation. The area we truly should be concerned about is bad actors using AI to scam us either individually or by interfering in our elections.
As long as there’s an off switch or a power cable we can yank from the wall, we have little to fear from the AIs we have today. Just make sure you double-check whatever answers they give you to the degree that what you will do with those answers is important. AGI, on the other hand, that is Artificial General Intelligence, is another thing entirely. That truly is C3PO and while I can’t wait to meet him should he ever arrive, we may need to take great care to make sure that its incentives are perfectly aligned with ours. That’s another conversation for another day.