Inline subagents have a problem you only feel once the work gets serious. The supervisor calls a subagent, the subagent runs for four minutes doing deep research, and the user sits there watching a spinner. The supervisor cannot answer follow-up questions, cannot launch a second subagent, cannot do anything except wait for the first one to return. For a chatbot that delegates a quick lookup, that is fine. For an agent that runs multi-step pipelines, large code analyses, or background research, it kills the interaction model.
Deep Agents v0.5, released on April 7, 2026, fixes this with async subagents. The supervisor calls start_async_task, gets a task ID back immediately, and goes back to talking to the user. Subagents now run concurrently, in the background, and the supervisor can check on them, send follow-up instructions, or cancel them mid-flight. The change looks small in code and large in what you can build.
What changed in v0.5
Deep Agents is the agent harness LangChain ships on top of LangGraph. It gives you a planning tool, a filesystem backend, and the ability to spawn subagents. Until v0.5, subagents were synchronous: a supervisor called one and the supervisor's execution loop blocked until the subagent finished.
The v0.5 release adds a second mode. The Python package is deepagents, the JavaScript package is deepagentsjs, both bumped to 0.5 on the same day. Async subagents are not a replacement for inline subagents. They are a separate construct with a different lifetime, a different protocol, and a different set of supervisor tools.
The key idea is that an async subagent is a remote run on an Agent Protocol server, not an inline call. Agent Protocol is LangChain's open spec for serving LLM agents and is already what powers LangGraph Platform. You create a thread to hold conversation context, start a run to kick off work, and check on it when you need the result. Async subagents are a thin wrapper over that contract.
If you have not used LangGraph for orchestration before, our guide on LangGraph state machines for autonomous AI agents covers the underlying graph and state model that Deep Agents builds on.
The five tools the supervisor gets
When you register async subagents, the AsyncSubAgentMiddleware adds five tools to the supervisor's toolbelt:
start_async_tasklaunches a background task and returns a task ID immediatelycheck_async_taskretrieves the current status and any results so farupdate_async_tasksends new instructions to a running task without canceling itcancel_async_taskstops a running tasklist_async_tasksreturns every tracked task with live status
That fifth tool matters more than it looks. Because LLM context gets compacted on long sessions, the supervisor can lose track of task IDs it issued earlier. To handle that, Deep Agents stores task metadata in a dedicated async_tasks state channel separate from the message history. Compaction does not touch it. The supervisor can always call list_async_tasks and see every task it ever started in the current session, even after the conversation has been summarized down five times.
Each tracked task carries the task ID, the agent name, the thread ID, the run ID, the status, and three timestamps: created_at, last_checked_at, and last_updated_at. That is enough information for the supervisor to reason about whether to wait, ask the user, or move on.
Wiring up an async subagent
The setup is short. You declare an AsyncSubAgent, point it at a graph, and pass it to create_deep_agent:
from deepagents import AsyncSubAgent, create_deep_agent
async_subagents = [
AsyncSubAgent(
name="researcher",
description="Deep research agent for multi-source investigation.",
graph_id="researcher",
),
AsyncSubAgent(
name="coder",
description="Code generation and refactoring agent.",
graph_id="coder",
url="https://coder-deployment.langsmith.dev",
),
]
agent = create_deep_agent(
model="anthropic:claude-sonnet-4-6",
subagents=async_subagents,
)
The fields on AsyncSubAgent are deliberate. name is what the supervisor sees in its tool descriptions. description is what the supervisor reads to decide which agent to delegate to, so it should describe the capability, not the agent's name or implementation. graph_id must match a graph registered in langgraph.json on whichever server the agent lives on.
The url field controls the transport. Omit it and Deep Agents calls the subagent in-process over ASGI, which gives you zero network latency and is the right choice when supervisor and subagents share a deployment. Include it and Deep Agents talks to a remote Agent Protocol server over HTTP, authenticating with LANGSMITH_API_KEY or LANGGRAPH_API_KEY. You can mix the two in the same supervisor, which is the topology most production setups end up with: a few cheap, fast helpers running ASGI alongside the supervisor, a few expensive ones running on their own infrastructure.
What the supervisor does in practice
Once the supervisor has the five tools, the conversational pattern looks roughly like this. The user asks a complex question. The supervisor decides this is a research-heavy task and calls start_async_task with the researcher subagent and a detailed instruction. It immediately gets back a task ID and tells the user something like "I've started a research task on X, ID task_a1b2c3. I'll check back when it has results, but feel free to ask me other things in the meantime."
The user asks something unrelated. The supervisor handles it inline, possibly while the research task is still running. A few minutes later, the user comes back and asks for the research result. The supervisor calls check_async_task with the stored ID, gets the result, and presents it.
If, halfway through, the user adds a new constraint ("also focus on the European market"), the supervisor calls update_async_task and steers the running task without restarting it. That is the part inline subagents could never do, because once you call an inline subagent, the supervisor has no execution slot left to reason in.
The example supervisor in the deepagents repo wraps this pattern in a small REPL with a memory checkpointer and a system prompt that explicitly tells the supervisor to call the async tools rather than rely on cached responses, and to avoid polling loops. Both warnings are worth keeping in your own system prompt: agents will happily fabricate task statuses or busy-loop asking check_async_task every step if you do not push them away from those failure modes.
Heterogeneous deployments and why teams want them
The shift from inline to async also unlocks a deployment pattern that was awkward before. Inline subagents have to share the supervisor's process, which means they share its model choice, its package versions, its memory and CPU budget, and its scaling profile. Async subagents do not.
A lightweight orchestrator on claude-haiku-4-5 can delegate to a heavier claude-opus-4-7 reasoning subagent for one specific kind of task and to a Gemini-based vision subagent for another. Each can run on different hardware, with its own toolset, its own dependencies, and its own scaling rules. The supervisor only needs the URL and the graph ID.
This is closer to how human teams divide work. Most production agents do not need a single all-powerful brain. They need a coordinator that knows when to hand off, plus specialists that do one thing well. Async subagents make the second part deployable without forcing every specialist to live in the same Docker image. If you are coming from a single-process LangChain agent, the production patterns for LangChain autonomous systems article is a useful baseline before splitting things across servers.
What it costs you
There is no free lunch in switching from inline to async, and the tradeoffs are worth being honest about.
The first cost is operational complexity. An async subagent is a deployed Agent Protocol server, with its own logs, its own metrics, its own failure modes, and its own scaling story. Inline subagents fail when the supervisor fails, and that is one thing to monitor. Async subagents can fail independently, hang, or get rate-limited by their model provider while the supervisor is healthy. You need to think about timeouts, retries, and what the supervisor should do when a task sits in running for an hour.
The second cost is reasoning surface area. Once you give the supervisor the ability to spawn many concurrent tasks, you have to write a system prompt that prevents it from spawning ten when one would do. Token budgets, cost ceilings, and rate limits do not enforce themselves. The supervisor will gladly call start_async_task in a loop if you let it.
The third cost is latency for short tasks. If a subagent takes five seconds to do its job, going async adds protocol overhead and a check-in round trip with no meaningful benefit. The break-even is somewhere around tens of seconds of subagent runtime. Below that, inline is still the right call. The Deep Agents docs do not put a hard number on it, and the right cutoff depends on your transport choice and your model latency, but the principle holds: async pays for itself only when the work is long enough to matter.
The fourth cost is debuggability. Inline subagent traces are linear: one trace, one timeline, one set of LangSmith spans. Async subagent traces are distributed: one trace per server, multiple threads, multiple runs. You can stitch them together with thread IDs, but the mental model shifts from "read the trace top to bottom" to "correlate across services." Teams that have done microservices migrations will recognize the pattern.
Failure modes to design for
Three failure modes show up often enough that they deserve explicit handling.
Stale task IDs after compaction. The async_tasks state channel survives compaction, but the supervisor's reasoning about why it started each task may not. A long session can produce a list of running tasks the supervisor no longer remembers the purpose of. A useful pattern is to store a short purpose string with each task in your supervisor's prompt and to teach the supervisor to summarize task purposes when it lists them.
Orphaned tasks. A user closes the browser tab, the supervisor session ends, but the remote subagent keeps running. You can either let them run to completion (cheap if the work is bounded) or you can register a session-cleanup hook that calls cancel_async_task on every active task when the supervisor exits. The right answer depends on whether your subagent work has side effects or just consumes tokens.
Update races. Calling update_async_task while a subagent is mid-step does not interrupt that step. The new instruction is queued and processed at the next decision point. This is usually what you want, but it means you cannot rely on updates being immediately visible. If the user gives a course-correction that needs to apply right now, cancel and restart instead.
Migrating from inline subagents
If you already have a Deep Agents supervisor with inline subagents, the migration is mechanical but not free. The SubAgent config becomes AsyncSubAgent with a graph_id and an optional url. The supervisor's system prompt needs new instructions about the five async tools. The graphs your subagents reference need to be deployed to an Agent Protocol server, either co-deployed via ASGI or remote via HTTP.
The deepagents repo ships an examples/async-subagent-server directory with a working supervisor and researcher pair that is the cleanest reference implementation to copy from. It also makes one decision worth noting: it uses a memory checkpointer for the supervisor and a separate URL for the researcher. That is the split-deployment topology in its simplest form.
A migration shortcut: keep your fastest, most predictable subagents inline and only convert the slow, network-bound, or expensive ones to async. There is no rule that says every subagent has to run the same way. The system prompt simply lists both kinds of tools, and the supervisor decides which to call based on the description.
For teams running parallel agents in other harnesses, the parallelism story we covered in Claude Code subagents and parallel AI work maps closely onto what Deep Agents formalizes here. Same shape, different harness: a coordinator that knows when to fan out, plus specialists that work concurrently and report back.
Where this fits in a 2026 agent stack
Async subagents fit into a bigger change in how production agents get built. LangGraph 1.0 went GA late last year and is now the most-adopted Python framework for production agents, with companies like Klarna, Uber, Replit, and Elastic running it. Deep Agents sits one layer above, giving you the planning, filesystem, and subagent primitives without forcing you to wire the graph yourself. The v0.5 release plugs async into that stack, and the timing is not random. Most teams that get past the prototype phase hit the inline-subagent ceiling within a few weeks.
If you are building agents that touch CI/CD or run long pipelines, the work has obvious overlap with the patterns covered in autonomous AI agents in CI/CD pipeline automation. Background subagents are a natural fit for long-lived pipeline work that the supervisor needs to coordinate without blocking on.
The honest take is that async subagents do not solve the hard parts of agent engineering. Prompts still need careful design, evals still matter more than most people admit, and the hardest bugs are about context and state, not concurrency. What async subagents do is remove the constraint that was making conversational agent UX awkward at the upper end of task complexity. You can now build a supervisor that delegates a real chunk of work, keeps talking to the user, and steers the work as it learns more, without rewriting your harness.
Where to start
The fastest path is to stand up a single async subagent on the same server as your supervisor over ASGI, run it against a task that takes ninety seconds or so, and watch the difference in supervisor responsiveness. That gives you the API surface and the supervisor prompt patterns without the operational overhead of a remote deployment. Once that works, split the subagent off to its own service and switch the transport to HTTP. The code change is one line: adding a url field. The deployment change is the rest of the work.
If you are picking between inline and async for a specific subagent, the question to ask is how long it takes on its slowest day, and whether the supervisor has anything useful to do in that time. If the answer is "a few seconds" and "no," keep it inline. If the answer is "minutes" and "yes," v0.5 finally gives you the right tool.