n8n Pagination and Rate Limits: Syncs That Don't Lose Records

Cursor and page pagination in n8n, retries with backoff, batching, and the patterns that keep large data syncs from quietly dropping rows.

By Tharindu Perera·Published 2025-07-29·Updated 2026-04-19·14 minutes
14 minutes
Intermediate
2025-07-29

Large API syncs fail for two reasons: you pull too much too fast (rate limits) or you don't paginate correctly (missed or duplicate records). The failures are silent and cumulative. You don't notice 200 missing rows until someone asks why the dashboard doesn't match the source.

This guide walks through cursor and page pagination, provider limits, and backoff in n8n, so syncs finish without quietly losing data.

If you want the inbound counterpart, my n8n Webhook hardening guide covers the patterns on the receiving side.

What gets built here

A resilient n8n workflow that:

  • Fetches all pages using page or cursor strategies without gaps
  • Handles 429 and 5xx with retries and exponential backoff
  • Batches processing with Split In Batches for memory safety
  • Resumes safely from checkpoints after restarts
  • Respects provider quotas by reading Retry-After

Pagination models, briefly

Before writing any nodes, identify which pagination model the target API uses. Most APIs fall into one of two camps, and picking the wrong pattern is how rows go missing or duplicate.

1. Page or offset pagination

The API accepts page + limit or offset + limit. You increment the page number (or offset) until the response returns fewer items than the limit.

Pros: easy to implement, easy to calculate total pages.

Cons: unstable when data changes during iteration. If a record is inserted or deleted between page requests, you get duplicates or gaps.

// Example: GET /items?page={{$json.page}}&limit=100
const page = $json.page ?? 1
return [{ page, limit: 100 }]

n8n loop pattern:

  1. Initialize page = 1 in a Set node
  2. HTTP Request node fetches the page
  3. Code node extracts items and checks the count
  4. IF node: items.length < limit means the last page
  5. Otherwise increment page and loop back to the HTTP Request

2. Cursor or token pagination (preferred)

The API returns a cursor, next_token, or next URL in each response. You pass it back in the next request. When the cursor is null or absent, the sync is done.

Pros: stable across data changes, no gaps or duplicates.

Cons: you can't jump to arbitrary pages, only iterate sequentially.

// Extract next cursor from API response
const next = $json.response?.meta?.next_cursor || null
return [{ cursor: next, hasMore: next !== null }]

n8n loop pattern:

  1. Start with an empty cursor in a Set node
  2. HTTP Request includes cursor (omitted on first call)
  3. Code node extracts items and next_cursor
  4. IF node: continue while hasMore is true
  5. Feed next_cursor back as cursor for the next iteration

If the API supports both, pick cursor pagination. It handles concurrent writes without data integrity issues. For where this sits in broader API design, see my 2025 API design patterns guide.

Rate limits and backoff

Every production API enforces rate limits. When you blow past them, you get 429 Too Many Requests. Transient 5xx errors need the same treatment: wait and retry.

Reading rate limit headers

Most APIs publish their limits in response headers. Check them before you hit the ceiling:

// Parse rate limit headers from HTTP response
const remaining = parseInt($headers["x-ratelimit-remaining"] || "100")
const resetAt = parseInt($headers["x-ratelimit-reset"] || "0")

if (remaining < 5) {
  const waitMs = Math.max(0, (resetAt * 1000) - Date.now()) + 500
  return [{ shouldThrottle: true, waitMs }]
}
return [{ shouldThrottle: false }]

Exponential backoff with jitter

When you do hit a 429 or 5xx, back off exponentially with random jitter so retries don't pile up into a thundering herd:

function backoff(attempt) {
  const base = 500  // ms
  const max = 16000
  const jitter = Math.floor(Math.random() * 250)
  return Math.min(max, base * 2 ** attempt) + jitter
}

let attempt = $json.attempt ?? 0
const status = $json.statusCode

if (status === 429 || (status >= 500 && status < 600)) {
  // Respect Retry-After header when present
  const retryAfter = $headers?.["retry-after"]
  const waitMs = retryAfter
    ? parseInt(retryAfter) * 1000
    : backoff(attempt)
  
  return [{ retry: true, waitMs, attempt: attempt + 1 }]
}
return [{ retry: false }]

Prefer Retry-After over your own backoff math when the API gives it. Shopify and GitHub both use this header to tell you exactly how long to wait.

Use a Wait node with the calculated waitMs, then loop back to the HTTP node. Cap attempts at 5 or 6 and route failures to a dead-letter queue (DLQ) for manual review.

Batching and memory safety

Pulling 50,000 records into memory at once will crash n8n or slow it to a crawl. Use Split In Batches to keep chunks manageable.

Reasonable batch sizes:

  • Light transformations (renaming fields, filtering): 500
  • One API call per record (enrichment, lookups): 50 to 100
  • Database writes: 200 to 500
// Normalize each item before batch processing
return $json.items.map((item) => ({
  id: item.id,
  email: item.email,
  updatedAt: item.updated_at,
  source: "api_sync"
}))

Concurrency control

When each batch fires off downstream API calls, cap concurrency so the target system doesn't fall over:

// Process batch items sequentially when target has strict rate limits
const results = []
for (const item of $json.batch) {
  results.push({
    ...item,
    processedAt: new Date().toISOString()
  })
}
return results

For APIs with generous rate limits, parallel processing is fine. For strict ones (Salesforce caps at 100 requests per 15 seconds), sequential processing inside each batch is safer.

Checkpointing: resume where you left off

Production syncs get interrupted. Server restarts, deployment rollouts, and network blips all cause workflow failures. Without checkpoints, you re-fetch everything from the beginning, which on a 2M-row CRM sync is a bad day.

Persist progress so restarts don't repeat work:

  • Page model: store the last successful page number
  • Cursor model: store the last cursor value
  • Timestamp model: store the highest updatedAt seen
// Save checkpoint after each successful batch
const checkpoint = {
  lastCursor: $json.cursor,
  lastUpdatedAt: $json.maxUpdatedAt,
  recordsProcessed: $json.totalProcessed,
  savedAt: new Date().toISOString()
}
// Write to your KV store, Supabase, or a simple JSON file
return [checkpoint]

Rules for safe checkpointing:

  1. Update the checkpoint only after the batch is fully processed and committed
  2. On restart, read the checkpoint first and resume from the stored position
  3. Use updatedAt >= checkpoint.lastUpdatedAt (inclusive) so records mid-write during the last save aren't lost
  4. Keep the last 5 checkpoints around so you can roll back if data corruption is detected

Reference architecture

A complete n8n pagination workflow looks like this:

  1. Set node: initialize page, cursor, or since from the checkpoint (or defaults)
  2. HTTP Request: fetch one page of data
  3. Code node: extract items, next cursor, and rate limit headers
  4. IF node: rate limited? Route to a Wait node, then back to the HTTP Request
  5. IF node: retry needed (429 or 5xx)? Route to a backoff Wait, increment attempt
  6. Split In Batches: process items in chunks
  7. Code or HTTP node: upsert to database, CRM, or destination API
  8. Code node: update the checkpoint with the current position
  9. IF node: more pages? Loop back to step 2

Patterns worth keeping

  1. Prefer cursor pagination when the API supports it. Page or offset breaks under concurrent writes.
  2. Request minimal fields with fields or select. Smaller payloads mean faster responses and lower memory pressure.
  3. Respect Retry-After over your own backoff math. The API knows its capacity better.
  4. Use idempotent upserts with a unique key so retries and overlapping windows don't duplicate rows.
  5. Log rate-limit metrics per run: number of 429s, average backoff duration, total records processed. This is how you tune batch sizes and schedules.
  6. Run long syncs in time windows (last 24 hours, since last checkpoint) rather than full table scans. Backfill historical data as a separate one-time job.
  7. Set execution timeouts in n8n settings. A sync that runs forever blocks the worker queue. Time it out, checkpoint progress, and pick up on the next scheduled run.

Troubleshooting

Problem Cause Fix
Duplicate records Retries without idempotency keys Use upserts with unique id from the source
Missing records (gaps) Page/offset with concurrent writes Switch to cursor pagination or lock time windows
429 storms Too aggressive concurrency or batch size Reduce concurrency, increase delay, check provider quotas
Memory pressure Building giant arrays before processing Lower batch size, stream through Split In Batches
Workflow hangs Infinite retry loop without max attempts Cap retries at 5-6 and route to DLQ
Stale data Checkpoints not updating after failures Only update checkpoint after successful commit

Deployment considerations

  • Scheduling: run large syncs during off-peak hours for both your infrastructure and the target API. Most providers have higher rate limits during off-peak windows.
  • Timeouts: raise n8n execution timeouts carefully. Chunked runs with checkpoints beat single long-running executions.
  • Monitoring: store execution summaries (record count, last cursor, error rate, duration) and alert when sync completion drops below 95%.
  • Cost: each HTTP request and n8n execution has a cost. Batch aggressively, request minimal fields, and cache responses when the API supports ETags or If-Modified-Since.

What this looks like in production

  • CRM backfill: 2M contacts on cursor pagination, batch size 300, checkpointed hourly, running on a nightly schedule. Total sync time around 4 hours, zero duplicates.
  • Shopify orders: respect Retry-After, pause on 429, resume with cursor. Handles Black Friday spikes (10x normal order volume) without missing orders.
  • GitHub issues: ETag caching skips unchanged pages entirely. Reduces API calls by 80% on low-activity repos, well within GitHub's 5,000 requests per hour limit.
  • Stripe events: paginate webhook event logs with the starting_after cursor, process each event idempotently by event.id, and persist the last processed event for crash recovery.

A reasonable next move

  1. Identify which pagination model your target API uses (look for cursor, next_token, offset, or page parameters in the docs)
  2. Add backoff logic and Retry-After handling to HTTP nodes before production traffic arrives
  3. Add checkpointing through a database or KV store so interrupted syncs resume cleanly
  4. Track records per minute, 429 rate, and backoff duration, then tune batch size and schedule from real data

About the author

T

Tharindu Perera

Tharindu Perera is a software engineer and solutions architect. He writes Refactix to share patterns from production work across AWS, distributed systems, and AI-driven development.

Follow RefactixLinkedIn·Facebook

Share this article

Topics Covered

N8n PaginationN8n Rate LimitsRetriesBackoffBatchingApi Sync

You Might Also Like

More from Refactix

Browse the full archive of guides and tutorials on AI, cloud, and modern architecture.

Explore All Guides
Subscribe

New articles, straight to your inbox

I publish new guides on AI-driven development, cloud infrastructure, and software architecture on a Tuesday and Friday cadence. Subscribe to get each one when it lands.

No spam, unsubscribe anytimeReal tech insights weekly