LLMs can't introspect

A common misconception I see, most recently with CLAUDE.md discussions, is that LLMs can introspect. They can’t.

When you ask an LLM “why did you say that?” or “write this so you can understand it later,” it does not look inward. It looks forward.

It generates the next most likely tokens based only on the tokens that already exist. That is the entire mechanism. There is no hidden reasoning process. There is no internal monologue it is summarizing. It is predicting what a good explanation should look like given the current context.

Order matters.

If an LLM writes code and you ask “why did you do it that way?”, it is not recalling a past decision. It is generating likely next tokens. The “why” is constructed from what is already on the page.

You can see this clearly in structured outputs.

Suppose you ask an LLM to produce JSON in this order:

{
  "result": "string",
  "reasoning": "string"
}

In this case, the reasoning is written after the result. The model has already produced the answer. The reasoning will be a reconstruction. It will sound plausible, but it does not reflect a prior internal chain of thought. It is generated to justify tokens that already exist.

Now change the order:

{
  "reasoning": "string",
  "result": "string"
}

Here the model must generate reasoning first. Because it only looks backward, the final result is conditioned on the reasoning tokens that came before it. In this setup, the reasoning genuinely shapes the answer. The sequence forces the model to think out loud before committing to a result.

The model does not have a hidden workspace. It has a single stream of tokens. Whatever is not in that stream does not influence future output.

This backward-looking property has been explored in recent research. One study showed that simply repeating the input prompt improves results. The explanation is simple. Because LLMs are causal (past tokens cannot attend to future tokens), the order of information in the prompt affects performance. By repeating the prompt, every token gets to attend to every other token across the two copies. It is not remembering better. It is just looking back at more useful tokens. The study tested this across seven popular models (Gemini, GPT, Claude, and Deepseek) and found that prompt repetition won 47 out of 70 benchmark-model combinations, with zero losses, without increasing the number of generated tokens or latency.

This is also why reasoning or thinking models were created. They are designed to produce intermediate reasoning tokens before producing a final answer. Those tokens are not a window into hidden thoughts. They are scaffolding placed into the token stream so that the final answer can condition on them.

The same applies to preferences. When you ask “what do you think?” or “which do you prefer?”, there is no opinion inside the model. There is only training data. The answer reflects statistical patterns it learned, not a belief it holds.

This is why asking an LLM to generate a CLAUDE.md that it “prefers” is fundamentally confused. There is no internal preference to consult. It will generate a document that statistically resembles what a good CLAUDE.md should look like, given the conversation and its training data. If you ask again under slightly different conditions, you may get a different answer. Each output is just the next most likely continuation, not a revealed internal standard.

Because of this, you can train an LLM to express opinions as X while its behavior follows Y. The two are not tightly linked. The explanation is just another prediction over tokens.

If a thought is not present in the token stream, it did not happen. There is no backstage process. The tokens are the thinking.

A confident self-assessment, a neat explanation, a clear preference — all come from the same source. Training data plus the tokens that came before.