Debunking myths around AI: what ChatGPT is, and what it isn’t

The more I read popular science articles and talk with people outside my AI bubble, the more I realize that the media is not painting a clear enough picture about AI, especially in terms of what ChatGPT is capable of. There are a lot of misconceptions about how intelligent ChatGPT would be. As an AI Engineer, I’d like to clear things up a bit.

“ChatGPT continuously learns as you chat with it.”

False.

Well, technically false. But it’s a bit more complicated, so allow me to explain.

Let’s clear one up first: ChatGPT is the name of a web application created by OpenAI, where you can chat with an AI language model. ChatGPT itself is not a model – it uses a model. Depending on when you were using ChatGPT, the model you were interacting with would either be GPT-3.5, GPT-4, GPT-4o-mini, et cetera. Forget the exact name though; just know that over time, you will be talking to a more and more “intelligent” AI model. This is done by retraining the model given that OpenAI is collecting more and more user data over time.

Contrary to common belief however, this retraining is not done truly continuously. OpenAI decides when to retrain a model, which takes a long time and a lot of processing power (and thus money). In this sense, it does not learn continuously, though the next version may have been trained on the data you’ve input into ChatGPT.

I say “may”, because there’s a setting somewhere in ChatGPT where you can specify that you don’t want OpenAI to train on your input data. If you select this, we will have to assume that your data won’t be trained on. Albeit very hard to verify, it is also hard to argue about when they’ve specifically added an option not to train on your data.

Lastly, even if you set it so that OpenAI won’t train on your data, you might still see ChatGPT recalling something from a past conversation. That’s because there is another setting called “memory”: when enabled, ChatGPT can consider parts of conversations important enough to store in memory. This is effectively a personal text file that the model isn’t trained on, but will “see” as context for all new conversations you have with it. If something from this memory-text is relevant to the question you ask, it will likely use part of this information to help answer your question.

Remember that I wrote about how OpenAI decides when to retrain a model? In the field of AI Engineering, there is a paradigm called Machine Learning Operations (often written as MLOps), which does aim to automatically, continuously retrain and improve AI models without human intervention. As far as I am aware, due to the sheer cost of retraining language models, this is not yet done at such large scale. Considering how costly it is to retrain a model for ChatGPT, it makes sense to decide manually, by a human, what the new setup in terms of model architecture and input data should be … for now.

“ChatGPT is intelligent.”

Neither true nor false.

That is, in my personal opinion. Scientists are still debating this very question. In the end, it all depends on how you define “intelligence” (which, funnily enough, is the single question that prompted me to study Artificial Intelligence back in 2013).

What OpenAI did really well is to abstract away all of the technical aspects of AI models and let the average user think they’re interacting with just a single model. Have you ever asked ChatGPT to generate an image of, say, a beautiful landscape? What you’re not getting to see is that ChatGPT’s answer will then contain a sentence along the lines of <generate_image>a beautiful landscape</generate_image>1. That’s because their servers continuously check whether ChatGPT’s output contains this <generate_image> tag, with which a separate image-generation AI model is then run given the image description. The ChatGPT app then substitutes this image into the conversation at that location. The user is none the wiser, as it looks like ChatGPT generated this image all by itself (read: by its own AI language model).

In the end, the AI language model that ChatGPT uses is trained to simply be a next-word predictor2. It does so via vector embeddings, a numerical representation of the meaning of words, which in turn is known as semantics. These vector embeddings – the meaning-of-words as numbers – happen to be an incredibly useful method for linking concepts together, just like we humans do. An apple and an orange are pretty close in semantics: they’re both fruits. But ‘apple’ can also refer to the technology company, so the vector embeddings will be somewhat close to e.g. ‘google’ as well. Training a language model effectively lets it predict which semantics follow the previous semantics. In the case of language models behind ChatGPT, it has learned to predict – when a question appears – that it should output word-by-word2 whatever semantics often show up in similar data that it was trained on.

The surprising thing is that language models, when large enough and trained for long enough, can be seen to generalize to answer questions that do not literally appear in its training data. This is where the debate about intelligence comes from: they do seem somewhat generally intelligent (which we call AGI),: they can generalize outside of what they’ve literally trained on as input data. At the same time, it can only do so when the semantics of a new answer is similar to the semantics of something it was trained on. Pretty vague, I know. But this means that they aren’t truly generally intelligent yet. This is required for the next step of AI: Artificial Superintelligence (ASI), where the capacity for AI reasoning far exceeds that of humans – which is what people tend to be so afraid of. If you’ve seen the movie Her, you’ve seen a depiction of ASI. I believe we’re still far from that point. Our language models have only just started to exhibit some signs of general intelligence, not at all fully. But that’s a debate for another time!

1. Note that it’s not actually <generate_image> but likely something similar.
2. Technically not ‘word’, but ‘token’ – which is often a portion of a word.

“We didn’t have AI when I was in college.”

Likely false.

You have to understand that Artificial Intelligence is a field of study that long predates ChatGPT – this chat is simply the current culmination of prior AI research. This statement is more of a pet peeve of mine. I believe it’s important to realise that current language models are only a small portion of what AI can do. If we treat ChatGPT as the start of AI, we’re ignoring decades of AI research, from the first computer program that could beat the world champion of chess to AI systems that have helped medical research already back in the 90s.