Claude Dreaming: memory consolidation between agent sessions

Harvey's legal-coordination agents started completing roughly six times more tasks after Anthropic switched on Claude Dreaming. Wisedocs cut document-review time in half. The model did not change. What changed is what the agents remembered.

That shift is the point of Claude Dreaming, the research-preview feature Anthropic announced at Code with Claude on May 6, 2026. Dreaming runs between agent sessions, reviews what the agent has done lately, and rewrites its memory store so the next session starts smarter than the last.

This article covers what Dreaming actually does, the architecture decisions it asks you to make, where it pays off in practice, and where you are better off skipping it.

What Claude Dreaming actually does

Memory in Claude Managed Agents is durable storage your agent can read and write while it works. A coding agent jots down which test suite to run for which directory. A support agent learns that customers on plan X always need a CSAT follow-up. None of that survives unless the agent writes it somewhere persistent.

The problem with raw memory is it grows messy fast. Duplicates accumulate. Stale entries linger. Patterns that span dozens of sessions stay invisible because no single session has the whole picture.

Dreaming is a scheduled background job that fixes that. It reads up to 100 of the agent's past sessions plus its current memory store, finds patterns and contradictions, then writes a curated memory back. Anthropic supports it on Claude Opus 4.7 and Claude Sonnet 4.6. A typical run takes minutes, not hours, so off-peak scheduling is practical.

What it looks for, in Anthropic's own framing: recurring mistakes the agent keeps making, workflows that multiple agents converge on independently, preferences shared across a team of agents. Things the agent could only see if it could step outside its own context and look at the whole tape.

A useful mental model: memory is what one agent learned in one session. Dreaming is the team retro.

Memory and Dreaming are not the same thing

It is tempting to read about Dreaming and think it replaces memory. It does not. The two work together and they handle different problems.

Memory captures information as the agent works. The agent decides what is worth saving in the moment, writes it to durable storage, and reads it back next session. I covered the broader pattern before in my guide on building persistent AI assistants with memory for business automation, and the failure modes are familiar. Agents over-save. Agents save the wrong things. Memory accumulates noise faster than signal.

Dreaming refines that memory between sessions. It cannot observe the agent live. It can only see what was written down. So an agent with bad memory hygiene produces dreams that consolidate noise.

That ordering matters for how you design the system. Memory rules come first. Dreaming compensates for the inevitable mess but does not excuse it. If your agent saves everything, Dreaming has a lot to prune. If your agent saves nothing, Dreaming has nothing to work with.

The two modes: automatic versus review-before-write

Dreaming runs in one of two modes. Automatic writes the consolidated memory back without human approval. Review-before-write surfaces the changes for a human to accept, edit, or discard.

The trade-off is straightforward. Automatic is faster, runs unattended, and scales to large fleets of agents. Review-before-write is slower but gives you an audit trail of every change Claude wants to make to the agent's long-term memory.

For most regulated work, review-before-write is the better default. Legal, financial services, healthcare, anything where the agent's memory could shape customer-facing decisions and an auditor might eventually want to know why the agent suddenly started recommending option B. Harvey, one of the launch customers, is a legal-AI platform, so it is no coincidence the feature shipped with reviewable changes from day one.

For internal tooling, dev agents, ops automations? Automatic is usually fine. The cost of a bad memory entry is low and you will catch it during the next run.

Hybrid is also a reasonable pattern. Review-before-write for the first few weeks until you trust what Dreaming surfaces, then flip to automatic once the pattern stabilizes.

Where Claude Dreaming pays off

Dreaming is built for a specific shape of agent, and the wins compound when a few things line up.

You need repetition. Dreaming finds patterns, and patterns need data. A one-off research agent that solves a unique problem each time gives Dreaming nothing to consolidate. A document-review agent that processes hundreds of similar files a week is exactly where the gains show up.

You also need an agent that already writes useful memory. If the memory store is empty or full of noise, the consolidated dream amplifies whatever is there. Garbage in, consolidated garbage out.

And the work has to be long-running enough that consolidation between runs matters. Short, snappy tasks rarely benefit because the agent already has its context loaded. Dreaming pays off when the agent comes back days later and needs the consolidated lessons from prior runs to skip work it already figured out. This is the same territory I walked through in my deep dive on async subagents and orchestrating long-running work.

Multi-agent setups benefit disproportionately. Anthropic shipped multiagent orchestration into public beta the same day Dreaming launched, and the pairing was intentional. When 20 specialist agents share a filesystem and contribute to a lead agent's context (the same architecture I walked through in my piece on Claude Code subagents and parallel work), each agent has its own private trial-and-error. Dreaming surfaces what one specialist learned and makes it available to the others, without you writing the cross-agent sync logic by hand.

The Harvey 6x and Wisedocs 50% numbers come from this combination. Many sessions, similar shape, fleet of agents working together. That is where the lift is.

Where it does not pay off

Worth being honest about the other side.

If your agent runs a unique task each time, Dreaming is overhead. There is nothing to consolidate. Research agents tackling novel questions, creative assistants helping with one-off content, exploratory coding agents poking at unfamiliar codebases. None of these will see meaningful gains.

If a human is already approving every action the agent takes, Dreaming is mostly redundant. The human is already filtering noise out of memory in real time. You would be paying for a slower second filter on top.

If your agent only runs for a single conversation and is then discarded, Dreaming has no future session to improve. The feature assumes durable agent identity across runs.

One practical limit: the feature is gated. You request access through Anthropic's form and they decide whether your use case fits the research preview. If you need something in production this quarter, plan for the possibility that you will be designing the architecture now but running without Dreaming until access lands.

Outcomes and multiagent orchestration: the stack matters

Dreaming did not ship alone. Anthropic moved two adjacent features into public beta the same day: outcomes and multiagent orchestration. The three work together.

Outcomes lets you define a success condition for the agent (the file compiles, the customer email is sent, the ticket is closed) and have Claude keep iterating until the condition holds or the budget runs out. In Anthropic's internal testing this added up to 10 percentage points of task success over a standard prompt loop.

Multiagent orchestration lets one lead agent delegate to up to 20 unique specialist subagents, with up to 25 concurrent threads, on a shared filesystem. Netflix has already deployed this for its platform team. A lead agent runs investigations while subagents fan out through deploy history, error logs, metrics, and support tickets in parallel.

Dreaming consolidates what all of that produces. Without Dreaming, the team of agents repeats the same lessons every run. With Dreaming, the lessons persist.

If you are considering the Managed Agents stack, the right reading order is: outcomes first to define what done means, multiagent orchestration to do the work in parallel, Dreaming to make the next run smarter than the last. The same separation of concerns I argued for in my guide on production-ready autonomous agents with LangChain shows up here, under different product names.

Production considerations before you wire it up

A few things to think about before Dreaming touches a live system.

Schedule it during low-traffic windows. The runs are minutes, not hours, but they read a lot of session data and they write back to the agent's memory. Off-peak avoids contention with live agent reads, and gives you a clean window where any memory churn does not coincide with active work.

Treat the memory store as a versioned artifact. Snapshot it before each dream so you can roll back. If a bad consolidation lands and the agent starts behaving worse, you want a known-good memory state to revert to without forensic reconstruction.

Log every change in review-before-write mode. Even when you approve a change, log the diff. The audit trail is cheap and the question "when did the agent learn that?" comes up more often than you expect.

Be careful with prompt injection upstream. If a malicious input slips into a session, the memory of that session could end up in the consolidated dream and persist forever. Sanitize at the session boundary, not at the dream boundary, because by then the bad memory has already been written.

Set evaluation gates on the agent before and after each dream. Run the same benchmark task set, compare scores, alert if quality drops. Self-improving systems can self-deprove too, and you want to catch that fast.

A practical sketch of the loop:

nightly_window:
  1. snapshot agent memory     -> versioned store
  2. run evaluation suite      -> baseline_score
  3. trigger dreaming run      -> consolidated memory
  4. run evaluation suite      -> post_dream_score
  5. if post_dream_score < baseline_score - threshold:
        rollback to snapshot
        page on-call
     else:
        keep new memory, archive snapshot

Nothing in that loop is specific to Claude. It is the same shape you would build for any self-mutating system. The difference is Claude is now the one mutating, and you are the one verifying.

Closing thoughts

The honest take: Dreaming is in research preview, the access is gated, and most teams reading this will not have it switched on tomorrow. The numbers are striking (Harvey's 6x, Wisedocs' 50%) but they come from launch customers who worked closely with Anthropic on the integration. Your mileage will vary.

What is worth doing now is designing your agent architecture as if Dreaming exists. Clean memory writes. Durable agent identity across sessions. Tasks with enough repetition that consolidation has something to bite into. Evaluation gates that would catch a regression. Build those well and you are ready when access lands. You also have a better agent regardless.

The bigger pattern is worth noticing too. Memory used to be something you bolted on with vector stores and custom retrieval. Now it is a managed primitive with a consolidation layer on top. Whatever you think of Anthropic's research-preview cadence, the direction of travel is clear: the surface area developers manage by hand keeps shrinking, and the value moves to deciding which agents to run, what good looks like for them, and how to evaluate the output.

That is the architecture problem worth investing in. Dreaming is one piece of it.

Claude Dreaming: memory consolidation between agent sessions

What Claude Dreaming actually does

Memory and Dreaming are not the same thing

The two modes: automatic versus review-before-write

Where Claude Dreaming pays off

Where it does not pay off

Outcomes and multiagent orchestration: the stack matters

Production considerations before you wire it up

Closing thoughts

Topics Covered

You Might Also Like

Claude Code Plugins: Packaging Integrations for Distribution

Deep Agents async subagents: long-running work without the spinner

Claude Code worktrees: parallel agents without stepping on each other

More from Refactix

New articles, straight to your inbox