Why LLMs don't know what a "tomato" is, and how not to be deceived?

Recently, scientific and popular science publications have started to feature materials dedicated to how large language models can reproduce conspiracy narratives and support irrational, sometimes mystical systems of beliefs. Moreover, for some users, interacting with such models can significantly distort their perception of reality. These observations prompted me to think about the reasons for such effects and possible ways to protect against them. One of the key steps, in my opinion, is to create a basic understanding among the general audience of how language models work and the limits of their applicability. This article is dedicated to exactly that.

Any neural network is an algorithm that receives data as input and produces a transformed result as output. In the case of LLMs (large language models), the main feature is working with textual representations of information. How do models like ChatGPT or DeepSeek generate answers to user queries? Let's take, for example, the word "tomato." For most of us, it's a round object, often edible. For a language model, it's just a vector — a set of numbers formally describing the position of the word in an abstract multidimensional space. The vector can have different dimensions, meaning it can contain a different number of features, for example: 2 features, or 700, or even 4000.

If a word has 700 features, what does that mean? A feature (a vector component) is a hidden characteristic of the word. Features can be different, for example: semantic (fruit or berry), linguistic (probability of being near adjectives like "red"/"sweet"; syntactic role — subject; connections with verbs, etc.), ontological (naturalness, organicity, wholeness), physical (round, hard), and others. Each feature in the word's vector representation stores a value: red 0.90,..., adverb -0.64. The vector for the word "tomato" might look like this, for example: [0.90, -0.23, -0.01,..., 0.55]. As mentioned earlier, each word in vector form occupies its own place in vector space; let's look at the simplest example for clarity:

To visualize the diagram, I used two dimensions — "hardness" and "edibility" — whereas in reality there are as many as there are features in the vectors. In the diagram above, I depicted the space as a coordinate plane, but in fact it's an abstract shape. The dots are vector representations of words. Five of them are shown as colored circles and labeled: tomato, tomato (alternate word), red, cucumber, fence. Notice that the word "red" is close to "tomato", but not right next to it. Vectors can be compared to each other — for example, their semantic proximity can be determined using cosine distance (that's 1 — cos of the angle between two vectors originating from the same point); the higher the proximity, the more equivalent the meanings of the words are and the closer they are to each other in vector space. But how does the model know the vector for the word "tomato" or the word "red"? And even more so, how does it know that "tomato" is almost the same as "tomato" (alternate word)?

Before language models get to arranging words into sentences as we are used to, they go through several stages of training:

1. Tokenization. The input text is segmented into discrete units—tokens. For example, the word "водопад" can be split into two tokens: "водо" and "пад", or it might stay whole—this depends on the model’s mechanism. Each unique token is assigned an integer from a fixed vocabulary. For example, if we’re training on a single sentence: "Юля ест кашу", the fixed vocabulary will have three numbers, as there will be three tokens (here each word is a token, since the words are simple), and these numbers will be assigned to the tokens. Tokenization allows the algorithm to work with words in a convenient (numeric) form and for several other reasons.

2. Vector initialization. For each token in the vocabulary, an initial vector representation is created in n‑dimensional space. Since the model knows nothing yet, the vector components are initialized with random values, usually from a normal or uniform distribution with low variance. At this stage, the vectors already exist in a vector space, but it’s unordered: semantically close words, like "помидор" and "томат", might end up far from each other, and different-meaning words could be close.

3. Model training. During training, the model sees texts or phrases, and observes which words most often appear where. Trying to pair words based on what it has seen, it refines the vectors. This is a complicated process that isn’t worth discussing here.

Back to the title: why, at the end of training, does the model still not know what "помидор" is? As you might have realized, language models do not actually understand the meaning of a word when they use it while generating a user’s answer.

The actual generation process goes like this: you ask a question → the LLM splits it into words, represents the words as vectors, → determines the function of each word (for example, the word "how" tells it to give an explanation, "why"—a reason, "what for"—a purpose) and its relation to the others. Then, word by word, the neural network generates the answer. For example, how to answer the question: "How many legs does a dog have?"

Step 1) After the model has split the question into tokens and done contextual processing, it begins predicting the first token of the answer:

Input: "How many legs does a dog have?"
Output: "A" (probability 0.95)

Step 2) Now the model sees the question + already generated word:

Input: "How many legs does a dog have? A"
Output: "dog" (probability 0.92)

Step 3) The process is repeated:

Input: "How many legs does a dog have? A dog"
Output: "4" (probability 0.87)

Step 4) The process is repeated again, and the model predicts the last word. How does it know it needs to stop at this word? Because when selecting the next word, the LLM calculated a low confidence in the following token.

Input: "How many legs does a dog have? A dog has 4"
Output: "legs" (probability 0.94)

Now imagine you're learning to speak a new language (for example, Hindi) guided only by context and gut feeling. You don't know what a particular set of characters means, but you've often seen it followed by a certain construction in texts, so you start using it the same way. You don't understand what people are saying to you, but you've heard others respond to those same words, so you repeat someone else's reply. You have no idea whether you're greeting someone or they're asking you about the weather, but you answer “it's raining” each time, gradually picking increasingly acceptable and expected words/sentences based on people's reactions. This is how language models work.

Now that we've broken down the mechanics of language models, it becomes clear why they seem so convincing in any area of knowledge. Their limitless competence isn't a result of deep understanding but a consequence of statistical processing of billions of texts. When an LLM confidently explains what a tomato is and supports a conspiracy theory, when it gives advice on personal relationships or medical recommendations—the same mechanism is at play: searching for the most probable sequence of words based on previously encountered texts.

This means that every time you get an answer from a language model saying nobody understands you and it's obvious you just need to stop taking the pills prescribed by your doctor to feel better, you should ask yourself: did it really “understand” my query and provide a justified answer, or did it just reproduce a snippet from a novel where the protagonist received exactly that response in a similar situation? Are you sure your question can't be interpreted differently? Are all the punctuation marks placed correctly and will anyone read your question the way you sound it out in your head? (I mean lexical stress, too.) The model does not “understand” your question in the conventional sense, doesn't weigh the ethical consequences of its answers, doesn't take your individual characteristics into account. It simply finds the statistically most probable continuation. Of course, an LLM's generative abilities have boundaries, but many recommendations become harmful solely because of their context.

Understanding this principle is key to safe interaction with LLMs. Instead of trusting their answers unconditionally, it pays to maintain critical thinking and bear in mind: behind every phrase stands not wisdom, but mathematics. The model can generate a brilliant analysis or a dangerous piece of advice with equal linguistic persuasiveness simply because, for it, these are just different combinations of vectors in a multidimensional space. The responsibility for evaluating the reliability, applicability, and safety of the received information always remains with us.

Comments