The Fallacy of the Chinese Room

If you spend much time reading about Artificial Intelligence and philosophy of the mind, you will hear about John Searle’s Chinese Room thought experiment. Written in 1980, it is an argument that however well a computer is able to imitate a mind, it cannot actually think.

The argument goes something like this. In a room you have a man who speaks fluent English but no Chinese. There is an input slot in which people can give to him pieces of paper with Chinese writing on it. He has with him a book of rules (written in English), which for a particular input tells him to return through another slot another piece of paper also with Chinese writing on it. The rules are sufficiently good enough that to a native Chinese speaker, the output appears to be a response to their writing by another native Chinese speaker.

Basically it’s passing the Turing Test for whether or not someone understands Chinese.

But, despite the fact that it is able to fool people into thinking the person inside understands Chinese, Searle maintains that the man in the room cannot. There is more to understanding Chinese than being able to fool people into thinking that you are a native speaker.

Searle claims that an artificial intelligence that can pass a Turing Test operates in a similar fashion. When it answers a question posed to it, it is not understanding either the question or the answer. It is just mechanically looking up an answer according to some algorithm. He claims there is more to thinking than being able to fool people into believing that you are a human being.

A common objection to this argument is from a systems view. Sure, the guy in the room doesn’t speak Chinese, he is only part of the system. And similarly individual brain cells or even entire lobes of your brain do not on their own think. But the entire system, you, can think. and similarly the entire system, the Chinese Room in whole, can understand Chinese.

Searle does not buy this. He maintains there is still something about the room that doesn’t understand Chinese. The symbols that are input are still just lines on paper. They don’t correspond to an actual things. If you asked it, “What color is the sky?” the symbol for “sky” doesn’t refer to the thing above us when we are outside. It is just a few symbols which, when combined with some rules, is mapped to another few symbols that happen to the Chinese word for blue. Does being able to map those together indicate an ability to understand what the sky is? Or is it nothing more than a simple computation?

A stronger objection with the thought experiment is that this is not how modern-day natural language processing programs work. They don’t just map one string of characters to an output, even if that is how it looks at a high level. Artificial neural networks typically form embedding networks, which map those words into internal representations. Then, processing is done to those internal representations to get the output.

To a human, those internal representations are pretty opaque. They are really just vectors stored in a network. But you can make the argument those representations do have some internal abstract meaning associated with them. Even simple models like Word2Vec have been shown to have emergent properties associated with them. For instance, it has been shown that if you subtract the vector for female from the vector mother, you will end up with a vector similar to parent. If you then add the vector male you will often get a vector similar to father.

Does this mean the network understands what those words mean? After all they are still just vectors in a matrix. But that’s just an exercise in reductionism. Our “understanding” of what the word “sky” refers to may well just be some combination of synapse strength within our brain, but that doesn’t stop us from saying we understand what the sky is.

To his credit Searle has admitted that neural network models do come closer to beating his thought experiment than the knowledge system he was describing. But is that really relevant to his argument? Regardless of whether or not some AI systems are not Chinese Rooms, if we have a knowledge system that is one and is able to pass the Turing Test, surely it is not thinking, is it? Does that render the fabled Turing Test obsolete?

Well, there is a reason modern-day day NLP frameworks don’t work this way. Such an architecture would be incredibly inefficient. Just think for a moment how many different inputs it would need to be able to handle. There is an infinite number of possible English sentences, and even if we restrict it only probable sentences, there are still far to many to enumerate. Then you have to take in account that a conversation involves many sentences in sequence. The answer to the question “What color is it?” has to be different if the subject of the previous sentence was the sky vs if it were the Statue of Liberty.

NLP systems don’t work the way Searle described in the Chinese Room because such a system would be impossible to build. Realistically no such system would be able to pass a Turing Test the same way that a system that has the ability to deal with internal representations of abstract concepts. Any real NLP system would have the very mechanism that Searle claims is missing in his Chinese Room.

Ok, but does this render the Chinese Room useless? Can it illustrate anything other than why you shouldn’t let a philosophy professor architect software?

I would argue there still is a useful concept in differentiating between imitating and understanding. An example can be seen with ChatGPT. It is designed to imitate human written text, not understand it. It is trained to return a plausible answer, not the right answer.

However this limitation is not because of some limitation of its architecture. It is not due to some fundamental difference between it and the human brain. It is simply the way it was trained. A system prioritizing correctness could conceivably do a better job here (though such a system would be harder to build, as that would require training data that can be evaluated for correctness, whereas LLMs like ChatGPT can be often be trained by just imitating any available data).

In fact the exact same flaw can be seen in many humans. Students are especially guilty of this, often learning just enough of a subject by rote to pass the test, but still lacking understanding.

So the difference between understanding and imitating is a useful distinction. But it is not one that is inherent when comparing humans and machines.

Share this:

Leave a Reply Cancel reply