Context Rot: Why AI Forgets and How to Fix It

TL;DR

Your AI starts every project sharp and ends it sloppy. By hour three of a long session, it's contradicting decisions it made earlier; by hour five, it's confidently breaking working code. The model didn't get worse — the context did. Researchers call this "context rot" — the steady degradation of output quality as a conversation grows, even when the context window is technically nowhere near full. The instinct to fix it by using a million-token model is wrong; bigger context windows degrade catastrophically and silently, with the model still answering confidently while averaging across so much noise that the signal is gone. Three patterns cause it: decision drift (old decisions become buried and lose salience), goal blur (the original goal gets diluted by recent tangents), and stale assumptions (the model's mental model of your codebase doesn't update when you change files). Four fixes compound: re-anchor explicitly every 30-60 minutes, end conversations earlier instead of treating length as a virtue, externalize memory into files the model reads each session, and trim aggressively. The reframe that helps most: context is not memory — it's a noisy room. The longer it goes, the harder it is to hear the original goal. Start treating context as a limited resource you spend deliberately, and the rot stops.

The Pattern You've Already Seen

You start a coding session with Claude. The first hour is brilliant. The model understands your intent, picks up your patterns, makes good suggestions, catches things you would have missed. You ship more in one hour than you would have in three on your own.

By hour three, something has changed. The model is making suggestions that contradict decisions it agreed to two hours ago. It's proposing patterns you explicitly rejected at the start. It's quietly re-introducing complexity you spent an hour stripping out.

By hour five, it's confidently breaking working code. It removes an import that was load-bearing. It suggests an architectural change that violates a constraint you explained at the start of the session. It cites assumptions you never gave it. The model's tone is the same — calm, helpful, confident — but its output has decayed.

The model didn't get worse. The context did. Researchers started calling this "context rot" — the steady degradation of output quality as a conversation grows, even when the model technically still has room in its context window. It's not a hallucination problem in the traditional sense; the model isn't making up facts. It's an averaging problem. When the context grows large enough, the model's attention spreads across too much material, and the recent and the old get blurred into a kind of confident mush.

Once you've recognized this pattern, you can't unsee it. Every long AI session you ever had — every project that started promising and ended with you frustrated, blaming "the model getting dumb" — was context rot.

Why Bigger Context Windows Aren't the Fix

The instinct is reasonable: if the problem is that the model is losing things in a busy context, just use the model with the biggest window. One million tokens! Why would you ever need to manage context if the window is so vast that nothing falls out of it?

That intuition is wrong, and it's wrong in a way that's actively harmful.

Bigger context windows don't degrade gracefully. They degrade catastrophically and silently. The model still answers — and answers confidently — but it's averaging across so much noise that the signal is gone. You get plausible-sounding code that violates constraints you established two hours ago. You get suggestions that contradict each other within the same response. You get explanations of "what we decided" that are reasonable-sounding but factually wrong about your actual decisions.

The silent part is the dangerous part. A model with a small context window gives you an error or a clearly confused response when it runs out of room. A model with a huge context window gives you a smooth wrong answer. The wrongness is hidden inside fluent prose, which is exactly the failure mode hardest to catch in code review.

This is the counter-intuitive lesson: more context is often worse than less context. Precision beats volume. The right answer is not "feed the model everything"; it's "feed the model exactly what it needs to answer this specific question." Curation, not capacity.

What Actually Causes the Rot

Three patterns I see over and over, in my own work and in the work of every developer I've talked to who works heavily with AI:

1. Decision drift. Early in the session you decide "we're using React Router, not Next.js." Two hours later, mid-task, the model proposes a Next.js pattern. Not because it forgot — the decision is still technically in the context window. It's because the original decision is buried under thousands of tokens of code, discussion, and tangents. It's still in context, but it's no longer salient. The model's attention has shifted to the recent material, and the foundational constraints have dimmed into background noise.

Decision drift gets worse with every decision you add. A session with 20 important decisions made over five hours becomes nearly impossible to keep coherent because the model can no longer tell which decisions are still active and which were superseded.

2. Goal blur. You started the session with "build the auth flow." An hour in, you noticed a button alignment issue and asked the model to fix it. Then you got into the weeds of CSS specificity. Then you fixed three related styling bugs. Then you went back to auth — but the model has lost the thread. Is it still trying to ship the auth flow, or is it trying to perfect this section's CSS? Without an explicit re-anchor, it defaults to whatever's most recent, which is the styling work, not the original goal.

Goal blur is what makes long sessions feel productive in the moment and unproductive in retrospect. You did a lot. You shipped less than you thought, because the AI was helping you with whatever was directly in front of you, not with what you actually came to accomplish.

3. Stale assumptions. Early in the session, the model formed a mental model of your codebase based on what it read and what you told it. You've since changed three files. The model is still reasoning against the old version because it hasn't been told to re-read. Its mental model of the codebase is now several edits behind reality, which produces suggestions that look right in theory and break in practice.

Stale assumptions are especially insidious because they correlate with confident-sounding output. The model is reasoning carefully from a model of your codebase — it's just the wrong model.

The Fixes That Actually Work

Four things I do that compound over time. None of them is a silver bullet alone, but together they make long sessions tractable and short sessions much sharper.

1. Re-anchor explicitly. Every 30 to 60 minutes, paste back the core context: "We're building X. Stack is Y. Key decisions so far: A, B, C. Current task: D. What's blocking us is E." Treat the model like a person who just walked into the meeting and needs to be brought up to speed. The re-anchor flushes the noise out of the salience layer and makes the original goal and decisions live again.

I usually keep a short "re-anchor template" in my notes that I paste with minor edits. The exact wording matters less than the act of doing it on a fixed cadence. Even when it feels unnecessary, do it. The 90 seconds you spend re-anchoring saves 30 minutes of subtle wrong-answer debugging later.

2. End conversations earlier. Long conversations are not a virtue. There is no medal for a four-hour AI session. When a major chunk of work is done, start a new chat. Carry forward only the artifacts that matter (file paths, key decisions, the next concrete task) and skip the entire chat history. The new conversation will be sharper than the old one would have been at the same point.

The hardest part of this rule is psychological. It feels wasteful to abandon a conversation that "knows" the project. But the conversation doesn't actually know the project — the codebase does, the decision log does, the CLAUDE.md file does. The conversation just has the noise. Letting it go and starting clean is almost always the right move.

3. Externalize memory. The single highest-leverage thing you can do is move decisions and context out of chat history and into files the model reads at the start of each session. CLAUDE.md files at every level of your codebase. A decisions log that captures every meaningful technical choice. A plan document for the current sprint or project. Anything that lives outside the conversation can survive across conversations.

The discipline this requires — writing things down instead of relying on the model to remember them — pays off everywhere. Even when AI gets dramatically better at context retention, the externalized memory is still valuable for human collaborators, for your future self, and as documentation.

4. Trim aggressively. If a 50-message debugging thread ended in a one-line fix, don't keep the 50 messages around. Summarize the fix in a sentence, paste that sentence as a new message, and let the rest decay out of the conversation. Most chat interfaces will keep all 50 messages in context regardless of how irrelevant they are; you have to actively manage this.

The same applies in Claude Code — when you've solved something hard, write down what you solved and how, then start a fresh session rather than letting the entire debugging trail bloat the next task's context.

The Reframe That Helped Most

I used to think of a conversation as a person remembering more as it grew. The longer the conversation, the more the model knew, the smarter its answers should be. This is wrong, and it's exactly the wrong framing to act on.

The accurate model is: a conversation is a noisy room. The longer it goes, the harder it is to hear the original goal. Adding more context is like adding more people to the room. Sometimes a new person brings exactly the insight you need; usually they just make the room louder. The question isn't "how do I add the right thing?" — it's "how do I keep the room quiet enough that the model can still hear the original goal?"

Once I started treating context as a limited resource I had to spend deliberately, the rot stopped. Not because the model got better — because I stopped feeding it noise. Every message I add to a conversation has a cost. Every file I attach has a cost. Every "explore this" tangent has a cost. The benefit has to justify the noise, and most of the time, it doesn't.

This isn't a constraint on what AI can do; it's a constraint on how to use it well. The same shift in mindset that makes humans more focused — single-tasking, removing distractions, explicit prioritization — makes AI sessions more focused too.

A Quick Test

Next time you feel the AI getting dumber mid-session, don't switch models or restart. Instead, paste this back to it:

Re-anchor check: what is the goal of this conversation, what have we decided so far, and what's the next concrete step?

If its answer surprises you — if it gets the goal subtly wrong, omits a decision you remember establishing, or names a "next step" that doesn't match what you thought you were working on — that's context rot. The conversation has drifted, and continuing it will only produce more wrong-answer-with-fluent-prose responses.

When the test fails, the right move is almost always to start a fresh conversation with a curated handoff: the file paths that matter, the decisions that survived, the next concrete task, and nothing else. The new conversation will be sharper than continuing the old one would be.

Why This Matters More Now

Context rot used to be a niche problem. AI sessions were short, the model was used for one-shot tasks, and the longest conversation you'd have with it was maybe twenty messages. Decision drift had no time to accumulate. Goal blur didn't happen because there was rarely a goal in the first place.

Then agentic workflows became real. Claude Code, autonomous agents, multi-hour sessions, projects that span days. Suddenly people are running AI sessions that are 200 messages deep, 50,000 tokens in, with a dozen decisions stacked on top of each other. Context rot went from a curiosity to the single biggest determinant of AI productivity.

The developers who get the most out of AI in 2026 are the ones who treat context management as a first-class skill — as important as prompt engineering, as important as choosing the right model. The ones who don't end up confused about why their AI sessions feel productive in the moment but produce sloppier work than they expected.

Closing Discipline

The discipline I recommend, in order of leverage:

Externalize memory into files. Stop relying on the conversation to remember anything that matters.

Re-anchor explicitly on a fixed cadence. Set a timer if you have to.

End conversations when a major task is done, not when you feel like stopping.

Trim aggressively. Less context, sharper signal.

Use the re-anchor test whenever the AI feels off. Treat the result as diagnostic, not insulting to the model.

None of this is glamorous. All of it works. And once you internalize that context is a resource you spend, not a pile that grows, the entire AI workflow gets more reliable.