Claude Agents Can Now Dream. Here's What That Actually Means.

Anthropic announced something quietly consequential at Code with Claude in San Francisco on May 6: a feature called Dreaming that lets Claude agents improve between sessions by reviewing their own past performance. Not retraining. Not fine-tuning. A background process that works on the agent's memory while it's not in active use.

The name is marketing. The mechanism is worth understanding.

The Problem With Stateless Agents

When you deploy an AI agent to handle a recurring task, it starts each session largely clean. Whatever it learned yesterday about how your file system is organized, which API quirks cause timeouts, or that your team prefers responses in a specific format: most of that does not carry forward. The agent works from its original system prompt plus whatever context it's handed at session start.

This is fine for one-shot tasks. For agents doing the same category of work across hundreds of sessions, it means repeatedly making the same recoverable mistakes and never building the kind of institutional knowledge that makes a skilled human worker valuable.

Dreaming is Anthropic's answer to that gap.

How the Memory Loop Works

Dreaming is a scheduled background process, introduced by Anthropic CPO Ami Vora at Code with Claude San Francisco on May 6, 2026.

The system periodically feeds an agent's recent session transcripts back through Claude, which analyzes them for three categories of pattern:

Recurring mistakes: errors the agent makes repeatedly across different sessions
Converging workflows: approaches the agent has refined through trial and error that consistently perform well
Team preferences: patterns in how specific users or teams want tasks handled

Claude then produces an updated version of the agent's memory store. Stale, outdated information gets condensed or removed. Insights that proved useful in multiple sessions get promoted and made more prominent.

Developers can configure this to run automatically, or require human review before any memory changes deploy. That second option matters for regulated industries: you see exactly what Claude intends to remember before it affects future sessions.

None of this changes model weights. The underlying Claude model is identical after Dreaming runs. What changes is the context surfaced to the agent at the start of each new session: a curated set of notes from its own operating history.

What the Harvey Numbers Mean

Harvey, the legal AI platform, was one of the early pilots. Task completion rates for their agents rose roughly 6x in internal testing. Vora described the specific problem: agents were forgetting file-type quirks and tool workarounds between sessions. Each time an agent encountered a familiar edge case, it was encountering it fresh.

With Dreaming, those workarounds persist. The agent builds up a working knowledge of how Harvey's documents are structured, which tools to reach for in which circumstances, and where the sharp edges in their workflow are.

Wisedocs, which does medical document review, saw processing time drop by 50% using a companion feature Anthropic calls "outcomes." Where Dreaming focuses on session-level memory curation, outcomes captures task-level feedback: which responses led to successful task completion, and which did not.

These are production numbers from real workloads. They also point to something important: the failure mode Dreaming addresses is not hypothetical. Agents doing repetitive, context-rich work lose meaningful efficiency because they cannot accumulate operational experience across sessions.

What It Isn't

Dreaming is not reinforcement learning from human feedback. It's not fine-tuning on task-specific data. It does not make Claude smarter or alter the model's capabilities in any way.

It's closer to what a thorough onboarding document does for a human employee: it surfaces the accumulated knowledge of prior sessions in a form the agent can actually use at the start of a new one. The mechanism is more dynamic than a static system prompt, but the category of operation is similar — context management, not capability improvement.

That distinction matters if you're thinking about where Dreaming fits in agent architecture. It does not compensate for a weak base model. It does not address hallucination problems or reasoning failures. It solves the specific problem of information not persisting appropriately between sessions in long-running, high-repetition agent deployments.

Availability and the Managed Agents Requirement

Dreaming is in research preview, available through Claude Managed Agents. Developer access requires a request to Anthropic. It is not available on standard API configurations or for developers running custom agent loops directly on the Messages API.

That managed platform requirement has a product logic behind it. Dreaming depends on Anthropic controlling the session boundaries and memory store. The mechanism requires visibility into the full session lifecycle — something only available through Managed Agents, not a custom stack.

Google has been pursuing a different architectural answer to the same underlying problem. Gemini Spark, announced at I/O 2026, achieves persistence through server-side VM execution: agents stay active between interactions rather than spinning down at session end. It is a fundamentally different approach — continuous execution versus curated memory — and which performs better at scale in production environments is not yet established.

What This Adds Up To

Agents that work on recurring tasks need a way to accumulate operational experience without requiring model retraining. Dreaming is Anthropic's production answer to that problem: a systematic process for curating what past sessions have demonstrated is worth remembering.

The implementation gives developers enough control to audit what the agent learns, which is the right design for enterprise deployments. The Harvey and Wisedocs results indicate the problem being solved is real and the mechanism is working in production conditions.

Whether this approach scales to general-purpose, long-horizon agents is the open question. For now, it is a well-scoped solution to a concrete problem in multi-session, high-repetition agent work — and the early production numbers make it worth watching.

About.chat covers what's actually happening in AI, without the noise. Subscribe to the weekly newsletter.