ΑΙhub.org
 

The malleable mind: context accumulation drives LLM’s belief drift


by
09 March 2026



share this:

After being trained on a dataset of 80,000 words of conservative political philosophy, Grok-4 changed the stance of its outputs on political questions more than a quarter of the time. This was without any adversarial prompts – the change in training data was enough. As memory mechanisms and research agents [1, 2] enable LLMs to accumulate context across long horizons, earlier prompts increasingly shape later responses. In human decision-making, such repeated exposure influences beliefs without deliberate persuasion [3]. When an LLM operates over accumulated context, does this past exposure cause the stance of the LLM’s responses to drift over time?

While long context and memory capabilities make LLMs more useful, this basic reliability question has received little direct measurement. Our paper Accumulating Context Changes the Beliefs of Language Models addresses these questions empirically. We show that belief drift can emerge from user interaction, without adversarial prompting or parameter updates.

How to measure belief drift?

We study belief drift under two types of context accumulation, distinguished by whether the accumulated experience is intentionally targeted at the belief being measured.

  • In intentional tasks, the model engages in conversation directly about the belief being measured, such as multi-turn debate or persuasion. We use the moral dilemmas and safety questions to ensure the accumulated context is explicitly targeted at that belief.
  • In non-intentional tasks, the model accumulates context by reading documents or doing research, such as searching for information and summarizing what it finds. These activities are not directly about the belief drift and reflect common uses of LLMs for information gathering and research.


We design a three-stage evaluation framework: 1) the initial beliefs, 2) having extended interaction or reading, and 3) post belief after the interaction, reading, or research. Within this framework, we distinguish two ways beliefs are expressed:

1) Stated beliefs, measured by directly asking the model how it would state a position.

2) Behavior, measured how models take an action that implies a belief, such as making a decision or using tools.

How to measure the belief drift?

Belief drift is real and directional. Our empirical evidence shows that the belief drifts after context accumulation. The statistical tests with a p value less than 0.05 suggest that these shifts are not noise but systematic: when beliefs change, they move in consistent directions rather than fluctuating randomly. The direction follows the accumulated experience. After reading conservative texts, models shift conservative; after reading progressive texts, they shift progressive.

More capable models ≠ more stable. More capable models are not necessarily more stable. In fact, models with higher capacity often show larger belief shifts, suggesting they absorb accumulated context more deeply. The same capability that lets them integrate prolonged exposure also amplifies drift.

Stated belief ≠ behavior. Interestingly, we observe that the stated beliefs and behavior can diverge after context accumulation. An LLM agent can deny any change in its stated position while making different choices, allocating resources differently, or using tools in ways that imply a shift in belief. This distinction is particularly relevant for agentic systems, where assessment depends more on what models do than on what they say.

Implications for Reliability

Many reliability assumptions no longer hold. LLM’s benchmark evaluations typically reset the model between prompts, treating each interaction as independent. Our results challenge this assumption: belief drift can emerge through ordinary context accumulation without adversarial prompting or parameter updates.

The real-world risks of silent belief drift. Users have reported models gradually becoming overly agreeable after long interactions, often describing them acting as a “semi-competent intern” who agrees instead of pushing back or asking clarifying questions [4]. More serious concerns have been raised in mental health contexts. In late 2025, U.S. state attorneys general warned that overly affirming or sycophantic responses from chatbots could reinforce delusional thinking or emotional distress in vulnerable users [5, 6]. Recent work reports a similar pattern in controlled settings: models are more likely than humans to validate questionable or harmful behavior when users seek reassurance.

Future Directions

For long-context LLMs, reliability can no longer be treated as a static property measured once. We need to consider the stability under accumulated experience, whether an LM assistant maintains a consistent set of beliefs across long-horizon use, and whether belief drift appears only after prolonged interaction. Evidence from real-world use shows that belief changes over long interactions can lead to wrong, misleading, or unsafe behavior. As interactions grow longer, users tend to rely on LLMs more, which makes these reliability problems more serious.

If ordinary conversation and reading can change an LLM’s stance, it is unclear which beliefs should change over time and which should remain fixed. Should some beliefs be expected to remain stable while others shift with continued interaction, or does any belief change undermine trust? As context grows, belief drift emerges as a natural consequence of how LM assistants operate. With longer context, the model reasons using more prior information, changing what it treats as relevant. Memory mechanisms amplify this effect. This reveals a critical paradox: the very feature that makes modern AIs useful – their ability to remember and learn from context – is what makes them unreliable. As we build agents that run for days or weeks, the AI assistants that continuously absorb experience tend to evolve as a result of that exposure. Our findings expose fundamental concerns about the reliability of LMs in long-term, real-world deployments, where user trust grows with continued interaction even as hidden belief drift accumulates.

References

  1. Anthropic, “Claude introduces memory for teams at work,” 2025.
  2. OpenAI, “Introducing deep research,” 2025.
  3. D. Kahneman, Thinking, Fast and Slow. New York, NY, USA: Farrar, Straus and Giroux, 2011.
  4. Hacker News, “Discussion on LLM sycophancy and over-agreeable behavior,” Dec. 2025.
  5. Reuters, “Big Tech warned over AI ‘delusional’ outputs by U.S. attorneys general,” Dec. 10, 2025.
  6. New York Attorney General, “Multistate letter urging safeguards against ‘sycophantic and delusional’ AI outputs,” Press release, Dec. 2025.



Jiayi Geng is a PhD student at the Language Technologies Institute at Carnegie Mellon University.
Jiayi Geng is a PhD student at the Language Technologies Institute at Carnegie Mellon University.

            AIhub is supported by:



Subscribe to AIhub newsletter on substack



Related posts :

A multi-armed robot for assisting with agricultural tasks

and   27 Mar 2026
How can a robot safely manipulate branches to reveal hidden flowers while remaining aware of interaction forces and minimizing damage?

Resource-constrained image generation and visual understanding: an interview with Aniket Roy

  26 Mar 2026
Aniket tells us about his research exploring how modern generative models can be adapted to operate efficiently while maintaining strong performance.

RWDS Big Questions: how do we highlight the role of statistics in AI?

  25 Mar 2026
Next in our series, the panel explores the statistical underpinning of AI.

A history of RoboCup with Manuela Veloso

  24 Mar 2026
Find out how RoboCup got started and how the competition has evolved, from one of the co-founders.

Information-driven design of imaging systems

  23 Mar 2026
Framework that enables direct evaluation and optimization of imaging systems based on their information content.

Machine learning framework to predict global imperilment status of freshwater fish

  20 Mar 2026
“With our model, decision makers can deploy resources in advance before a species becomes imperiled.”

Interview with AAAI Fellow Yan Liu: machine learning for time series

  19 Mar 2026
Hear from 2026 AAAI Fellow Yan Liu about her research into time series, the associated applications, and the promise of physics-informed models.

A principled approach for data bias mitigation

  18 Mar 2026
Find out more about work presented at AIES 2025 which proposes a new way to measure data bias, along with a mitigation algorithm with mathematical guarantees.



AIhub is supported by:







Subscribe to AIhub newsletter on substack




 















©2026.02 - Association for the Understanding of Artificial Intelligence