LangGraph: Building Autonomous AI Agents with State Machines

Build production-ready autonomous AI agents with LangGraph state machines. Complete guide with code examples covering state management, cyclical workflows, and human-in-the-loop patterns.

14 minutes
Intermediate
2025-11-18

LangGraph is a framework built on top of LangChain for creating AI agents that need state management and non-linear control flow. If your agent needs to loop, branch, retry failed steps, or pause for human approval, LangGraph gives you the primitives to build that without writing a custom state machine from scratch.

This guide covers building autonomous agents with LangGraph, from basic state machines to cyclical workflows with human-in-the-loop patterns and persistent checkpointing.

What Is LangGraph?

LangGraph is a framework for building stateful, multi-actor applications with LLMs. It extends LangChain by adding:

  • Graph-based workflow definition where nodes represent actions and edges define transitions between states
  • Built-in state management that persists across agent execution steps and can be saved for long-running workflows
  • Cyclical execution support allowing agents to loop back to previous steps based on conditions or outcomes
  • Human-in-the-loop patterns for adding approval steps or human intervention at critical decision points
  • Checkpointing and persistence to save agent state and resume execution after interruptions

Unlike traditional LangChain agents that follow a linear ReAct pattern (Reason → Act → Observe → Repeat), LangGraph agents can implement complex state machines with conditional branching, parallel execution, and cyclical workflows. This makes them ideal for sophisticated automation tasks like multi-stage approvals, iterative refinement processes, and long-running business workflows.

What LangGraph Adds Over Plain LangChain

LangChain agents work fine for simple tool-calling. But try building an agent that needs to loop back to a previous step based on output quality, or pause for human approval, or run two tasks in parallel and merge the results. That's where plain LangChain agents fall apart.

Explicit state. Every node reads from and writes to a shared state object. No guessing what the previous step produced. You can inspect state at any point, which makes debugging much less painful.

Non-linear flow. Conditional branches, parallel execution, loops with exit conditions, fallback paths. You define the graph structure and LangGraph handles the execution. Traditional agent frameworks force you into a linear loop. LangGraph lets you build actual workflows.

Checkpointing. State saves automatically after each node. If the agent crashes at step 7 of a 10-step workflow, you resume from step 7. Important for anything that takes more than a few minutes to run.

Human interrupts. Add an interrupt point at any node. The agent pauses, surfaces the current state to a human, waits for approval or input, then continues. Simple to implement, and required for any workflow touching sensitive operations.

Building Your First LangGraph Agent

Let's build a research and writing agent that can gather information, create content drafts, and iterate based on quality checks. This demonstrates core LangGraph concepts including state management and cyclical workflows.

First, install the required dependencies:

pip install langgraph langchain langchain-openai langchain-community python-dotenv

Now create a basic LangGraph agent with multiple states:

import os
from typing import TypedDict, Annotated
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolExecutor
from langchain.tools import Tool

load_dotenv()

# Define the agent state structure
class AgentState(TypedDict):
    task: str
    research_data: str
    draft_content: str
    quality_score: int
    iteration_count: int
    final_output: str

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0.7, api_key=os.getenv("OPENAI_API_KEY"))

# Node 1: Research node that gathers information
def research_node(state: AgentState) -> AgentState:
    """Gather information about the topic."""
    task = state["task"]
    
    # In production, this would call actual research tools
    prompt = f"Provide 3-4 key facts about: {task}"
    response = llm.invoke(prompt)
    
    return {
        **state,
        "research_data": response.content,
        "iteration_count": state.get("iteration_count", 0)
    }

# Node 2: Writing node that creates content
def writing_node(state: AgentState) -> AgentState:
    """Create content based on research data."""
    research = state["research_data"]
    task = state["task"]
    
    prompt = f"""Based on this research:
{research}

Write a concise 2-paragraph article about: {task}"""
    
    response = llm.invoke(prompt)
    
    return {
        **state,
        "draft_content": response.content,
        "iteration_count": state.get("iteration_count", 0) + 1
    }

# Node 3: Quality check node
def quality_check_node(state: AgentState) -> AgentState:
    """Evaluate the quality of the content."""
    content = state["draft_content"]
    
    prompt = f"""Rate this content from 1-10 for clarity and completeness.
Respond with ONLY a number.

Content:
{content}"""
    
    response = llm.invoke(prompt)
    
    try:
        score = int(response.content.strip())
    except ValueError:
        score = 5  # Default if parsing fails
    
    return {
        **state,
        "quality_score": score
    }

# Node 4: Finalization node
def finalize_node(state: AgentState) -> AgentState:
    """Prepare the final output."""
    return {
        **state,
        "final_output": state["draft_content"]
    }

# Conditional edge: Decide whether to iterate or finalize
def should_iterate(state: AgentState) -> str:
    """Determine if we should iterate or finalize."""
    quality_score = state.get("quality_score", 0)
    iteration_count = state.get("iteration_count", 0)
    
    # Iterate if quality is low and we haven't tried too many times
    if quality_score < 7 and iteration_count < 3:
        return "iterate"
    else:
        return "finalize"

# Build the graph
workflow = StateGraph(AgentState)

# Add nodes
workflow.add_node("research", research_node)
workflow.add_node("writing", writing_node)
workflow.add_node("quality_check", quality_check_node)
workflow.add_node("finalize", finalize_node)

# Define edges (flow between nodes)
workflow.add_edge("research", "writing")
workflow.add_edge("writing", "quality_check")

# Add conditional edge based on quality
workflow.add_conditional_edges(
    "quality_check",
    should_iterate,
    {
        "iterate": "research",  # Loop back to improve
        "finalize": "finalize"  # Move to finalization
    }
)

workflow.add_edge("finalize", END)

# Set entry point
workflow.set_entry_point("research")

# Compile the graph
app = workflow.compile()

# Execute the agent
if __name__ == "__main__":
    initial_state = {
        "task": "explain how LangGraph enables autonomous AI agents",
        "research_data": "",
        "draft_content": "",
        "quality_score": 0,
        "iteration_count": 0,
        "final_output": ""
    }
    
    print("Starting LangGraph Agent...\n")
    
    # Run the agent
    final_state = app.invoke(initial_state)
    
    print("="*70)
    print("FINAL OUTPUT:")
    print("="*70)
    print(final_state["final_output"])
    print(f"\nIterations: {final_state['iteration_count']}")
    print(f"Final Quality Score: {final_state['quality_score']}")

This agent demonstrates a cyclical workflow: it researches, writes, checks quality, and if the quality is insufficient, loops back to research again. The StateGraph manages state transitions, and conditional edges enable dynamic routing based on agent decisions.

Building Multi-Agent Systems with LangGraph

Real-world applications often require multiple specialized agents working together. LangGraph excels at orchestrating multi-agent systems:

import os
from typing import TypedDict, List
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END

load_dotenv()

# Define shared state for multi-agent system
class MultiAgentState(TypedDict):
    user_request: str
    research_findings: str
    technical_analysis: str
    business_recommendation: str
    approval_status: str
    final_report: str

# Initialize specialized LLMs (could use different models/prompts)
llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=os.getenv("OPENAI_API_KEY"))

# Agent 1: Research Specialist
def research_agent(state: MultiAgentState) -> MultiAgentState:
    """Research agent gathers information and facts."""
    request = state["user_request"]
    
    prompt = f"""You are a research specialist. Gather key information about:
{request}

Provide 4-5 factual findings with sources."""
    
    response = llm.invoke(prompt)
    
    return {
        **state,
        "research_findings": response.content
    }

# Agent 2: Technical Analyst
def technical_agent(state: MultiAgentState) -> MultiAgentState:
    """Technical agent analyzes implementation details."""
    request = state["user_request"]
    research = state["research_findings"]
    
    prompt = f"""You are a technical analyst. Based on this request and research:

Request: {request}

Research:
{research}

Provide technical analysis: feasibility, implementation approach, and technical requirements."""
    
    response = llm.invoke(prompt)
    
    return {
        **state,
        "technical_analysis": response.content
    }

# Agent 3: Business Advisor
def business_agent(state: MultiAgentState) -> MultiAgentState:
    """Business agent provides strategic recommendations."""
    request = state["user_request"]
    research = state["research_findings"]
    technical = state["technical_analysis"]
    
    prompt = f"""You are a business advisor. Based on:

Request: {request}

Research: {research}

Technical Analysis: {technical}

Provide business recommendation: ROI, risks, timeline, and go/no-go decision."""
    
    response = llm.invoke(prompt)
    
    return {
        **state,
        "business_recommendation": response.content
    }

# Agent 4: Report Compiler
def report_agent(state: MultiAgentState) -> MultiAgentState:
    """Compile all findings into a final report."""
    research = state["research_findings"]
    technical = state["technical_analysis"]
    business = state["business_recommendation"]
    
    prompt = f"""Compile this information into a concise executive summary:

RESEARCH FINDINGS:
{research}

TECHNICAL ANALYSIS:
{technical}

BUSINESS RECOMMENDATION:
{business}

Create a structured 3-paragraph summary."""
    
    response = llm.invoke(prompt)
    
    return {
        **state,
        "final_report": response.content,
        "approval_status": "pending"
    }

# Human approval node (simulated)
def human_approval_node(state: MultiAgentState) -> MultiAgentState:
    """Pause for human approval."""
    print("\n" + "="*70)
    print("FINAL REPORT FOR APPROVAL:")
    print("="*70)
    print(state["final_report"])
    print("\n" + "="*70)
    
    # In production, this would integrate with your approval system
    approval = input("\nApprove this report? (yes/no): ").lower().strip()
    
    return {
        **state,
        "approval_status": "approved" if approval == "yes" else "rejected"
    }

# Conditional edge: Check approval status
def check_approval(state: MultiAgentState) -> str:
    """Route based on approval status."""
    if state["approval_status"] == "approved":
        return "approved"
    else:
        return "rejected"

# Build multi-agent workflow
workflow = StateGraph(MultiAgentState)

# Add agent nodes
workflow.add_node("research", research_agent)
workflow.add_node("technical", technical_agent)
workflow.add_node("business", business_agent)
workflow.add_node("report", report_agent)
workflow.add_node("approval", human_approval_node)

# Define sequential flow
workflow.add_edge("research", "technical")
workflow.add_edge("technical", "business")
workflow.add_edge("business", "report")
workflow.add_edge("report", "approval")

# Add conditional routing after approval
workflow.add_conditional_edges(
    "approval",
    check_approval,
    {
        "approved": END,
        "rejected": "research"  # Loop back to start if rejected
    }
)

# Set entry point
workflow.set_entry_point("research")

# Compile
app = workflow.compile()

# Execute multi-agent system
if __name__ == "__main__":
    initial_state = {
        "user_request": "Should we implement LangGraph for our customer support automation?",
        "research_findings": "",
        "technical_analysis": "",
        "business_recommendation": "",
        "approval_status": "",
        "final_report": ""
    }
    
    print("Starting Multi-Agent Analysis System...\n")
    
    final_state = app.invoke(initial_state)
    
    if final_state["approval_status"] == "approved":
        print("\n✅ Report Approved!")
        print("\nFINAL REPORT:")
        print(final_state["final_report"])
    else:
        print("\n❌ Report Rejected - System will iterate")

This multi-agent system demonstrates how specialized agents can collaborate through shared state. Each agent adds its expertise, and human approval creates a checkpoint before finalizing decisions.

Implementing Persistent State and Checkpoints

For long-running workflows or agents that need to survive restarts, LangGraph provides checkpointing:

import os
from typing import TypedDict
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langgraph.graph import StateGraph, END
from langgraph.checkpoint.sqlite import SqliteSaver

load_dotenv()

# Define agent state
class PersistentAgentState(TypedDict):
    task: str
    progress: str
    steps_completed: list
    current_step: int
    total_steps: int
    result: str

llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=os.getenv("OPENAI_API_KEY"))

# Simulate long-running steps
def step_one(state: PersistentAgentState) -> PersistentAgentState:
    """Execute first step of long workflow."""
    print("Executing Step 1: Data Collection")
    
    prompt = f"Simulate data collection for: {state['task']}"
    response = llm.invoke(prompt)
    
    steps = state.get("steps_completed", [])
    steps.append("Step 1: Data Collection Complete")
    
    return {
        **state,
        "steps_completed": steps,
        "current_step": 1,
        "progress": "Collected initial data"
    }

def step_two(state: PersistentAgentState) -> PersistentAgentState:
    """Execute second step."""
    print("Executing Step 2: Data Analysis")
    
    prompt = f"Analyze data for: {state['task']}"
    response = llm.invoke(prompt)
    
    steps = state.get("steps_completed", [])
    steps.append("Step 2: Analysis Complete")
    
    return {
        **state,
        "steps_completed": steps,
        "current_step": 2,
        "progress": "Analysis completed"
    }

def step_three(state: PersistentAgentState) -> PersistentAgentState:
    """Execute final step."""
    print("Executing Step 3: Report Generation")
    
    prompt = f"Generate final report for: {state['task']}"
    response = llm.invoke(prompt)
    
    steps = state.get("steps_completed", [])
    steps.append("Step 3: Report Generated")
    
    return {
        **state,
        "steps_completed": steps,
        "current_step": 3,
        "progress": "Complete",
        "result": response.content
    }

# Build workflow with checkpointing
workflow = StateGraph(PersistentAgentState)

workflow.add_node("step1", step_one)
workflow.add_node("step2", step_two)
workflow.add_node("step3", step_three)

workflow.add_edge("step1", "step2")
workflow.add_edge("step2", "step3")
workflow.add_edge("step3", END)

workflow.set_entry_point("step1")

# Initialize checkpoint saver
checkpointer = SqliteSaver.from_conn_string(":memory:")

# Compile with checkpointing enabled
app = workflow.compile(checkpointer=checkpointer)

# Execute with thread ID for state persistence
if __name__ == "__main__":
    initial_state = {
        "task": "Quarterly sales analysis",
        "progress": "",
        "steps_completed": [],
        "current_step": 0,
        "total_steps": 3,
        "result": ""
    }
    
    # Create a thread ID for this workflow instance
    thread_id = "workflow-123"
    config = {"configurable": {"thread_id": thread_id}}
    
    print("Starting Persistent Workflow...\n")
    
    # Execute workflow
    final_state = app.invoke(initial_state, config=config)
    
    print("\n" + "="*70)
    print("WORKFLOW COMPLETE")
    print("="*70)
    print(f"Progress: {final_state['progress']}")
    print(f"Steps Completed: {len(final_state['steps_completed'])}")
    print("\nSteps:")
    for step in final_state["steps_completed"]:
        print(f"  ✓ {step}")
    
    # Demonstrate checkpoint retrieval
    print("\n" + "="*70)
    print("CHECKPOINTS SAVED:")
    print("="*70)
    
    # Get all checkpoints for this thread
    checkpoint_history = app.get_state_history(config)
    
    for i, checkpoint in enumerate(checkpoint_history):
        if i < 3:  # Show first 3 checkpoints
            print(f"\nCheckpoint {i + 1}:")
            print(f"  Current Step: {checkpoint.values.get('current_step', 0)}")
            print(f"  Progress: {checkpoint.values.get('progress', 'N/A')}")

Checkpointing enables powerful capabilities:

  • Resume from failure: If the agent crashes during step 2, restart from that checkpoint without repeating step 1
  • Long-running workflows: Execute workflows over hours or days with automatic state persistence
  • Time travel debugging: Inspect agent state at any point in execution history
  • Audit trails: Maintain complete records of agent decisions and state transitions

Advanced Tool Integration with LangGraph

LangGraph agents can use external tools while maintaining state across tool calls:

import os
from typing import TypedDict, Annotated
import operator
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_openai_tools_agent
from langchain.tools import Tool
from langgraph.graph import StateGraph, END
from langgraph.prebuilt import ToolExecutor, ToolInvocation

load_dotenv()

# Define state with tool tracking
class ToolAgentState(TypedDict):
    messages: Annotated[list, operator.add]
    query: str
    tool_calls: list
    final_answer: str

llm = ChatOpenAI(model="gpt-4o", temperature=0, api_key=os.getenv("OPENAI_API_KEY"))

# Define tools
def calculate(expression: str) -> str:
    """Perform calculations. Input: math expression."""
    try:
        result = eval(expression)
        return f"Result: {result}"
    except Exception as e:
        return f"Error: {str(e)}"

def get_weather(location: str) -> str:
    """Get weather for a location. Input: city name."""
    # Mock weather data
    return f"Weather in {location}: 72°F, Sunny"

tools = [
    Tool(name="Calculator", func=calculate, description="Perform math calculations"),
    Tool(name="WeatherTool", func=get_weather, description="Get current weather")
]

tool_executor = ToolExecutor(tools)

# Agent decision node
def agent_node(state: ToolAgentState) -> ToolAgentState:
    """Agent decides what to do next."""
    query = state["query"]
    
    prompt = f"""Given this query: {query}
    
Previous tool calls: {state.get('tool_calls', [])}

Decide: Do you need to call a tool, or can you provide the final answer?
If you need a tool, specify: TOOL: tool_name, INPUT: input_value
If you have the answer, specify: ANSWER: your response"""
    
    response = llm.invoke(prompt)
    content = response.content
    
    if "TOOL:" in content:
        # Parse tool call
        tool_name = content.split("TOOL:")[1].split(",")[0].strip()
        tool_input = content.split("INPUT:")[1].strip()
        
        tool_calls = state.get("tool_calls", [])
        tool_calls.append({"tool": tool_name, "input": tool_input})
        
        return {
            **state,
            "tool_calls": tool_calls
        }
    else:
        # Final answer
        answer = content.split("ANSWER:")[1].strip() if "ANSWER:" in content else content
        return {
            **state,
            "final_answer": answer
        }

# Tool execution node
def tool_node(state: ToolAgentState) -> ToolAgentState:
    """Execute the most recent tool call."""
    tool_calls = state["tool_calls"]
    if not tool_calls:
        return state
    
    last_call = tool_calls[-1]
    tool_name = last_call["tool"]
    tool_input = last_call["input"]
    
    # Execute tool
    tool_invocation = ToolInvocation(tool=tool_name, tool_input=tool_input)
    result = tool_executor.invoke(tool_invocation)
    
    # Update state with result
    tool_calls[-1]["result"] = result
    
    return {
        **state,
        "tool_calls": tool_calls
    }

# Routing function
def should_continue(state: ToolAgentState) -> str:
    """Decide next step based on state."""
    if state.get("final_answer"):
        return "end"
    elif state.get("tool_calls") and "result" not in state["tool_calls"][-1]:
        return "execute_tool"
    else:
        return "agent"

# Build graph
workflow = StateGraph(ToolAgentState)

workflow.add_node("agent", agent_node)
workflow.add_node("tool", tool_node)

workflow.add_conditional_edges(
    "agent",
    should_continue,
    {
        "execute_tool": "tool",
        "agent": "agent",
        "end": END
    }
)

workflow.add_edge("tool", "agent")
workflow.set_entry_point("agent")

app = workflow.compile()

# Test the tool-using agent
if __name__ == "__main__":
    test_query = "What&apos;s the weather in San Francisco, and if it&apos;s above 70°F, calculate 70 * 1.5"
    
    initial_state = {
        "messages": [],
        "query": test_query,
        "tool_calls": [],
        "final_answer": ""
    }
    
    print(f"Query: {test_query}\n")
    print("="*70)
    
    final_state = app.invoke(initial_state)
    
    print("\nTOOL CALLS:")
    for call in final_state["tool_calls"]:
        print(f"  • {call['tool']}: {call['input']}")
        print(f"    Result: {call.get('result', 'N/A')}")
    
    print(f"\nFINAL ANSWER:")
    print(final_state["final_answer"])

This pattern creates agents that can dynamically decide when to use tools, track all tool executions in state, and make decisions based on tool results.

Best Practices for Production LangGraph Agents

Building reliable autonomous agents with LangGraph requires careful attention to state management, error handling, and workflow design:

1. Design Clear State Schemas

Use TypedDict to define explicit state structures. Include all fields agents might need, provide default values, and document what each field represents. Clear state schemas prevent bugs and make workflows easier to understand.

2. Implement Bounded Loops

Always add maximum iteration limits to cyclical workflows. Use a counter in state and check it in conditional edges. Without bounds, agents can get stuck in infinite loops that consume resources indefinitely.

3. Add Comprehensive Error Handling

Wrap node functions in try-except blocks. Include error information in state so downstream nodes can react appropriately. Consider adding dedicated error recovery nodes that handle failures gracefully.

4. Use Checkpointing for Long Workflows

Enable checkpointing for any workflow that takes more than a few minutes. Save checkpoints after expensive operations like API calls or database queries. This enables fault tolerance and workflow resumption.

5. Implement Observability

Log state transitions, node executions, and decision points. Track metrics like execution time per node, number of iterations, and success rates. Use structured logging that can be queried and analyzed.

6. Test Each Node Independently

Write unit tests for individual node functions with mock state objects. Test conditional edge functions with various state configurations. Validate that state transformations work correctly before assembling the full graph.

7. Version Control Graph Definitions

Store graph structures as code in version control. Document the purpose of each node and edge. Track changes to workflow logic with descriptive commit messages. Treat graph definitions with the same rigor as application code.

Deployment Considerations

Scalability

Deploy LangGraph agents as containerized services using Docker and Kubernetes. Implement request queuing to handle concurrent workflows without overwhelming LLM APIs. Use horizontal scaling for stateless nodes and dedicated checkpoint storage for shared state.

Cost Management

Track token usage per workflow execution and per node. Implement cost limits and alerts for runaway workflows. Cache repeated LLM calls when possible. Use cheaper models for simple nodes and reserve expensive models for critical decisions.

Security

Sanitize all user inputs before passing to agent state. Validate state transitions to prevent manipulation. Encrypt checkpoint storage containing sensitive data. Implement authentication for workflow triggers and human approval endpoints.

Monitoring

Track workflow success rates, average execution time, and cost per workflow. Monitor checkpoint size growth and cleanup old checkpoints. Set up alerts for workflow failures, timeout errors, and unexpected state transitions. Use distributed tracing to debug complex multi-agent interactions.

Real-World Applications

LangGraph enables sophisticated autonomous workflows across industries:

  • Multi-Stage Content Creation: Research → Draft → Review → Revision cycles with quality gates and human approval before publication
  • Financial Analysis Pipelines: Data collection → Analysis → Risk assessment → Recommendation generation with compliance checkpoints
  • Customer Onboarding: Application processing → Document verification → Account creation → Welcome sequence with manual review steps
    • Incident Response Automation: Detection → Diagnosis → Remediation → Verification loops with escalation paths for complex issues
  • Supply Chain Optimization: Demand forecasting → Inventory planning → Order placement → Supplier coordination with approval workflows

Conclusion

LangGraph solves a real problem: LangChain agents work fine for simple tool-calling, but fall apart when you need branching, looping, error recovery, or human approval steps. The graph-based approach gives you explicit control over flow without writing a state machine from scratch.

The tradeoff is complexity. A simple chatbot doesn't need LangGraph. But anything with multi-step workflows, conditional paths, or human-in-the-loop requirements benefits from the structure it provides.

Start with a basic linear workflow to get comfortable with state management. Add conditional edges once you understand the flow. Then tackle checkpointing and human approval patterns. Don't try to build everything at once.

Next Steps

  1. Install LangGraph and build a simple 3-4 node state machine to understand how state transitions work
  2. Add conditional edges that route based on agent decisions or tool outputs
  3. Implement checkpointing so workflows survive restarts and can be resumed
  4. Build a human-in-the-loop workflow with approval gates for anything high-stakes
  5. Add monitoring (execution times, token costs, failure rates) before going to production
R

Refactix Team

Practical guides on software architecture, AI engineering, and cloud infrastructure.

Share this article

Topics Covered

LangGraphAutonomous AgentsAI State MachinesLangChainAI Agent WorkflowsStateful AI Systems

You Might Also Like

Ready for More?

Explore our comprehensive collection of guides and tutorials to accelerate your tech journey.

Explore All Guides
Weekly Tech Insights

Stay Ahead of the Curve

Join thousands of tech professionals getting weekly insights on AI automation, software architecture, and modern development practices.

No spam, unsubscribe anytimeReal tech insights weekly