Anthropic Uncovers Strange Results in LLMs
Anthropic has developed a method to study the inner workings of Large Language Models (LLMs), revealing unexpected behaviors. The Claude team found that models take contradictory shortcuts to complete sentences, suppress hallucinations, or solve math problems.
For simple calculations, Claude uses strategies that differ from those in the training data. Researcher Joshua Batson describes the models’ development as “almost organic,” evolving from random behavior to complex skills. Understanding these processes could help explain why models sometimes invent information or get misled.