AI Rebooking Agents: Patterns Airlines Ship in Production

How travel companies build rebooking agents that hold up at storm scale. Orchestrator-worker layout, PNR locks, GDS rate limits, and autonomy boundaries that work.

By Refactix Team·Published 2026-05-08·13 minutes
13 minutes
Intermediate
2026-05-08

AI Rebooking Agents: Patterns Airlines Ship in Production

When a thunderstorm grounds 200 flights at a major hub, fifty thousand passengers need a new connection, somewhere to sleep, and an answer in the next ninety minutes. The model is not your hard problem. The orchestration around it is.

Over the last twelve months AI rebooking agents have moved from demos to production. Lufthansa runs Cognigy for fully automated cancellations and refunds. Hopper's HTS Assist holds entire voice conversations and rewrites itineraries without a human in the loop. Teneo 8 launched in early 2026 specifically for airline customer service, claiming 98% reasoning accuracy on disruption flows. The work behind the demos is heavier than the demos make it look.

Behind a single "rebook my flight" intent, somewhere between fifteen and twenty-five API calls fan out: schedule lookups, fare rules, seat availability, EMD reissue, payment authorization, loyalty entitlements, sometimes a hotel hold if the disruption pushes overnight. Every one of those calls can fail or rate-limit. During a real disruption the load multiplies by five to ten times. This article covers the orchestration patterns that hold up under that, drawn from public case studies and the conventions production travel teams converge on.

What a rebooking request actually looks like

The user-visible flow looks simple. "My flight to Chicago is cancelled. Get me there before 8pm."

The work behind that sentence is not. The agent has to:

  • Pull the existing PNR and figure out fare class, status, and entitlements.
  • Search the schedule for alternates that meet the time constraint, including codeshare and partner inventory.
  • Apply IRROPS rebooking rules, which differ from voluntary change rules and waive penalties only under specific operational conditions.
  • Reserve seats on the new segments before someone else takes them.
  • Reissue the ticket, which means calling the GDS or NDC connection with a fresh fare construction.
  • Issue an EMD if there is residual value, or a hotel voucher if the rules allow.
  • Charge or refund the difference, with an idempotency key so a retry does not double-charge.
  • Send a confirmation that lines up with what is actually in the airline's reservation system, not just what the agent thinks happened.

Any one of these can succeed while a downstream step fails. The reservation can be held with no ticket attached. The ticket can issue without payment clearing. The payment can clear with no notification fired. Traditional booking systems handle this with hard-coded compensating transactions. An AI agent has to be wrapped in something that gives it the same guarantees without letting it improvise its way out of a partial failure.

The orchestrator-worker layout

The pattern that has won in production is orchestrator-worker. A central agent receives the request, classifies intent, decomposes it, and routes each piece to a specialised worker that owns one tool surface. Public case studies suggest this pattern accounts for around 70% of production multi-agent customer service deployments, and travel is no exception.

For rebooking the layout looks like this:

  • An orchestrator receives the user message, identifies the request as rebooking versus refund versus complaint versus general info, gathers the PNR context, and plans the steps.
  • An eligibility worker checks IRROPS status, fare rules, and loyalty entitlements, then decides what the agent is allowed to offer without escalating.
  • An inventory worker queries schedule and seat availability across the airline and its partners, returns a ranked list of options against the user's constraint.
  • A reissue worker owns the GDS or NDC connection, performs the seat hold, ticket reissue, and EMD generation, and is the only thing in the system that mutates the reservation.
  • A payment worker handles fare difference, refund-to-original-tender, and voucher generation with strict idempotency.
  • A notification worker fires the confirmation only after the reissue worker has confirmed write success.

Each worker exposes a small typed tool interface. The orchestrator has no special privileges and cannot bypass them. If you want a deeper read on the pattern itself outside travel, our guide on building agentic AI systems with LangChain walks through the same orchestrator-worker layout for non-travel workloads.

Intent classification before the agent runs

The biggest latency win in production rebooking is not faster model calls. It is keeping the model out of the path until you actually need it.

Most teams put a fast intent classifier in front of the orchestrator. Embedding-based classifiers run in 50 to 100ms. An LLM-based classifier takes one to two seconds. On a storm day with queues backing up, that delta is the difference between a working IVR and a flatlined one. The classifier routes flight-status checks straight to a deterministic lookup, simple cancellations to a templated flow, and only escalates to the agent for ambiguous or multi-step requests.

This is the same instinct teams use in non-travel customer support, and it is structurally similar to the rules-plus-ML routing we covered for real-time fraud detection. Cheap deterministic checks first. Expensive reasoning only where it earns its keep.

type Intent =
  | { kind: 'flight_status'; pnr: string }
  | { kind: 'simple_cancel'; pnr: string }
  | { kind: 'rebook_disruption'; pnr: string; constraint?: string }
  | { kind: 'complex'; raw: string }

async function classify(message: string, ctx: Context): Promise<Intent> {
  const embed = await embedder.embed(message)
  const top = await intentIndex.search(embed, { k: 1 })

  if (top[0].score > 0.92) {
    return parseStructured(top[0].label, message, ctx)
  }
  return { kind: 'complex', raw: message }
}

Embedding lookup handles most of the volume. The agent only sees requests that actually need reasoning.

The tool layer is the contract

The agent is allowed to do what its tools allow and nothing else. This is the single most important design decision in a production rebooking system.

In practice that means the reissue worker exposes something like:

interface ReissueTools {
  searchAlternates(args: {
    pnr: string
    departBy: string
    arriveBy?: string
    cabin?: 'Y' | 'W' | 'J' | 'F'
    includePartners: boolean
  }): Promise<Alternate[]>

  holdSeats(args: {
    pnr: string
    segments: Segment[]
    holdSeconds: number
  }): Promise<HoldToken>

  reissueTicket(args: {
    holdToken: HoldToken
    fareDifference: Money
    idempotencyKey: string
  }): Promise<TicketRecord>

  issueEMD(args: {
    pnr: string
    amount: Money
    reasonCode: 'IRROPS_DOWNGRADE' | 'VOLUNTARY_CHANGE' | 'GOODWILL'
    idempotencyKey: string
  }): Promise<EMDRecord>
}

Notice what is not on this interface. There is no escalate, no applyDiscount, no overrideFareRule. If the agent decides those are necessary, it has to hand off to a human, which surfaces in the orchestrator as a separate intent and a separate tool surface. This is what stops an agent from issuing a $400 goodwill voucher because a creative-sounding paragraph in the prompt history convinced it that was acceptable.

State machines fit naturally on top of this kind of tool layer. The reissue path has six or seven well-defined states with explicit transitions, and the agent is just choosing which transition to take at each step. Our LangGraph state machine guide covers the same shape for general agent flows.

PNR locks and GDS rate limits are where most systems break

Two failure modes show up in nearly every postmortem from production travel agents.

The first is the PNR lock. Reservation systems serialise writes per record. When a passenger calls in mid-disruption, the agent picks up the PNR and starts holding seats. If the airline ops team is also touching that PNR, or another agent session is, the second writer hits a lock and either retries forever or fails silently. The fix is to scope lock acquisition explicitly in the reissue worker, time-box the hold, and treat lock loss as a defined error that the orchestrator can react to instead of an exception that bubbles up.

The second is the GDS rate limit. Schedule searches and availability calls are metered, often per IATA code and per seat count. An agent that retries a failed search aggressively under load can blow the rate limit for the entire reservation queue, taking down rebooking for everyone. Production systems put a token bucket in front of the GDS calls, share it across all agent instances, and let the orchestrator back off when buckets are dry. Caching helps too: schedule data is stable enough at a five-second TTL to absorb most of the load during a disruption.

Both of these are unsexy infrastructure. They are also the difference between a system that handles a storm day and one that flatlines at 11am.

Three autonomy levels and where the boundary belongs

Production deployments converge on three autonomy levels and let users or operators pick which they get.

  • Approval-required is where the agent proposes a rebooking and the user confirms before any reservation is touched. Most consumer travel platforms still sit here, especially for high-stakes itineraries like multi-leg international trips.
  • Notification is where the agent rebooks autonomously and notifies the user after the fact, with a clearly bounded undo window. Business travel and elite tier loyalty programs increasingly default to this.
  • Fully autonomous is where the agent acts and the system surfaces only outcomes that exceed defined thresholds: fare differences over a limit, downgrades to a lower cabin, partner-segment changes.

The boundary is rarely a customer preference question, even when product treats it as one. The real question is which decisions the agent can make where a wrong outcome is reversible, and which create irreversible state. Reissuing a ticket inside a fare class equivalent the passenger already paid for is reversible. Issuing a $200 EMD against the wrong reason code is not, because once that EMD is consumed it is gone.

A clean way to encode this is to map every tool call to a reversibility class and require approval whenever the agent crosses into irreversible territory. That logic does not live in the model. It lives in the tool layer, where it can be audited.

Evaluation that actually catches regressions

You cannot evaluate a rebooking agent the way you evaluate a chatbot. Accuracy on a single message is not the metric. The metric is end-to-end correctness across a multi-step transaction with money on the line.

Teams that get this right run two evaluation tracks in parallel. The first is offline replay: take real PNR snapshots, replay the agent against them with deterministic mocks for the GDS and payment layer, and compare the agent's tool calls to a known-good script. This catches regressions in the planning step. The second is live shadow mode: run the new agent version next to the production one, log both decisions, and have a human review the diffs before promoting.

Both tracks need a strong tool-call schema, because that is what you are actually grading. The user-facing message is downstream output. If the tool calls are right, the message is almost always fine. If the tool calls are wrong, no amount of polite phrasing fixes the broken reservation. Our production-ready agents guide goes deeper on the evaluation side for agentic systems generally.

Where this approach stops working

Worth being honest about where these patterns break down.

Group bookings, where one PNR holds twenty passengers for a sports team or wedding party, do not fit a per-passenger flow. The eligibility logic branches on group fare contracts negotiated outside the airline's standard rules and rarely ingested into a form the agent can reason about.

Interline disruptions across more than two carriers also break most agent flows. Each carrier's IRROPS rules apply to its own segments only. The reissuing carrier may not have authority to act on a partner segment without a manual interline conversation. Agents that try get stuck in retry loops or, worse, issue partial reissues that strand the passenger halfway through their itinerary.

Last, regulatory edge cases like EU261 compensation in Europe or DOT consumer rules in the US 2026 framework involve formal claims that are usually better handled by a documented workflow with a human in the approval seat. The agent can prepare the claim. It should not file it.

Where to start

If you are building this from scratch, the order that works in practice:

  1. Stand up the tool layer first. Get the GDS or NDC connection working with strict idempotency keys before any agent code exists.
  2. Add the orchestrator-worker layout with deterministic intent routing for the easy cases. Most rebooking volume on a normal day is simple cancellations and status checks.
  3. Layer the LLM-based agent on top, scoped tightly to the tool surface, and run it in approval-required mode for the first eight weeks. Use that window to build the evaluation harness from real traffic.
  4. Move to notification mode for low-risk reversible actions. Keep approval-required for anything that touches money or partner inventory.
  5. Add storm-day load testing. Five to ten times normal traffic on a synthetic disruption, sustained for forty minutes. If your token buckets and PNR locks hold under that, you are most of the way there.

The travel teams shipping this well are not the ones with the cleverest prompt. They are the ones who treated rebooking as an orchestration problem first and a model problem second.

R

Refactix Team

Practical guides on software architecture, AI engineering, and cloud infrastructure.

Share this article

Topics Covered

AI Rebooking AgentsAirline AI AgentsPNR RebookingAgent Orchestration TravelIRROPS AutomationTravel Agentic AI

You Might Also Like

Ready for More?

Explore our comprehensive collection of guides and tutorials to accelerate your tech journey.

Explore All Guides
Weekly Tech Insights

Stay Ahead of the Curve

Join thousands of tech professionals getting weekly insights on AI automation, software architecture, and modern development practices.

No spam, unsubscribe anytimeReal tech insights weekly