Cutting AWS Lambda cold starts to under 100ms

How to get AWS Lambda cold starts under 100ms with SnapStart, smarter runtime choices, smaller packages, and provisioned concurrency without blowing the budget.

By Tharindu Perera·Published 2025-09-19·Updated 2026-04-19·14 minutes
14 minutes
Intermediate
2025-09-19

Lambda cold starts are the tax you pay for serverless. Every time AWS spins up a new execution environment, your function takes seconds instead of milliseconds to respond. For background jobs, nobody cares. For API endpoints behind a user-facing app, a 3-second P99 is unacceptable.

The good news: sub-100ms cold starts are achievable for most workloads. SnapStart, runtime selection, dependency management, and provisioned concurrency each cut cold start time, and the effects stack. This guide walks through each technique with benchmarks and cost tradeoffs so you can decide what's worth it for your workload. Provisioned concurrency in particular can quietly inflate the bill, which is why it's worth pairing with AWS Cost Anomaly Detection.

How a cold start actually works

A cold start happens when Lambda has to initialize a fresh execution environment to handle a request. There are several phases, each adding to total latency:

The phases

  1. Download Code (50-200ms): AWS retrieves your deployment package from S3
  2. Start Execution Environment (100-300ms): Initialize the runtime (Node.js, Python, etc.)
  3. Initialize Runtime (50-500ms): Load runtime dependencies and prepare execution context
  4. Initialize Function Code (10-5000ms): Execute your initialization code outside the handler
  5. Invoke Handler (1-100ms): Execute your actual function logic

Total Cold Start Time: 211ms to 6100ms depending on runtime, package size, and code complexity

Cold vs warm

# Example Lambda function showing initialization vs execution phases
import boto3
import json
from datetime import datetime

# INITIALIZATION PHASE (runs only on cold starts)
# This code executes once per execution environment
print(f"Cold start initialization at {datetime.now()}")

# Initialize AWS SDK clients outside handler
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('users')

# Load configuration or models here
CONFIG = {
    'timeout': 30,
    'retry_count': 3
}

def lambda_handler(event, context):
    """
    EXECUTION PHASE (runs on every invocation)
    This code runs on both cold and warm starts
    """
    print(f"Handler invoked at {datetime.now()}")
    
    # Get user from DynamoDB
    user_id = event.get('userId')
    response = table.get_item(Key={'id': user_id})
    
    return {
        'statusCode': 200,
        'body': json.dumps(response.get('Item', {}))
    }

# Cold start: Initialization + Execution = 800ms
# Warm start: Execution only = 50ms
# Cold start penalty: 750ms

When cold starts hit

You'll see cold starts in a few situations:

  • First invocation after a deploy
  • Scaling up when concurrent requests exceed your warm container count
  • After an idle period (usually 5-15 minutes of no traffic)
  • Code updates that force new execution environments
  • AWS-side infrastructure changes or maintenance

A useful rule of thumb: a function getting 100 requests per hour will see a 10-15% cold start rate. A function getting 10,000 requests per hour, with scaling configured properly, sees 1-3%.

Runtime choice matters more than you'd think

Runtime has a big impact on cold start performance. Real-world benchmarks from AWS Lambda in October 2025:

Cold start by runtime

Runtime Avg Cold Start P99 Cold Start Memory Package Size
Node.js 20 150ms 250ms 512MB 5MB
Python 3.12 180ms 300ms 512MB 10MB
Python 3.12 + SnapStart 85ms 120ms 512MB 10MB
Java 17 2500ms 4000ms 1024MB 50MB
Java 17 + SnapStart 250ms 450ms 1024MB 50MB
.NET 8 900ms 1500ms 1024MB 30MB
.NET 8 + SnapStart 180ms 280ms 1024MB 30MB
Go 1.21 120ms 200ms 512MB 15MB
Rust 100ms 180ms 512MB 8MB

A few things to pull from this:

  • Node.js and Python are the fastest interpreted runtimes
  • SnapStart cuts Java cold starts by about 90% and Python by about 50%
  • Go and Rust are consistently fast with small memory footprints
  • Java and .NET basically require SnapStart to be acceptable in latency-sensitive paths

Picking a runtime

# Python 3.12 - Optimized for fast cold starts
# Best for: APIs, data processing, general purpose
# Cold start: ~180ms baseline

import json
import boto3
from aws_lambda_powertools import Logger, Tracer
from aws_lambda_powertools.utilities.typing import LambdaContext

logger = Logger()
tracer = Tracer()

# Lazy load heavy dependencies
_s3_client = None

def get_s3_client():
    """Lazy loading pattern for SDK clients"""
    global _s3_client
    if _s3_client is None:
        _s3_client = boto3.client('s3')
    return _s3_client

@tracer.capture_lambda_handler
@logger.inject_lambda_context
def handler(event: dict, context: LambdaContext) -> dict:
    """Fast handler with minimal initialization"""
    bucket = event['bucket']
    key = event['key']
    
    # Only initialize S3 client if needed
    s3 = get_s3_client()
    obj = s3.get_object(Bucket=bucket, Key=key)
    
    return {
        'statusCode': 200,
        'body': json.dumps({'size': obj['ContentLength']})
    }
// Node.js 20 - Fastest cold starts for JavaScript
// Best for: APIs, webhooks, real-time processing
// Cold start: ~150ms baseline

const { S3Client, GetObjectCommand } = require('@aws-sdk/client-s3');

// Initialize clients outside handler
const s3Client = new S3Client({ region: process.env.AWS_REGION });

// Use ES modules for smaller bundle size
exports.handler = async (event) => {
    const { bucket, key } = event;
    
    const command = new GetObjectCommand({
        Bucket: bucket,
        Key: key
    });
    
    try {
        const response = await s3Client.send(command);
        
        return {
            statusCode: 200,
            body: JSON.stringify({ 
                size: response.ContentLength 
            })
        };
    } catch (error) {
        console.error('Error:', error);
        throw error;
    }
};

Code-level optimizations

Optimizing the function code itself can cut cold starts by 50-70% without changing runtime or paying for provisioned concurrency.

1. Shrink the package

# Before optimization: 45MB package
# Cold start: 800ms

# After optimization: 8MB package  
# Cold start: 320ms (60% improvement)

# Python: Use layer for dependencies, exclude dev packages
pip install --target ./package --no-deps --platform manylinux2014_x86_64 --only-binary=:all: requests

# Node.js: Use esbuild for tree-shaking
npm install -g esbuild
esbuild index.js --bundle --platform=node --target=node20 --outfile=dist/index.js

# Remove unnecessary files
zip -r function.zip . -x "*.git*" "*.pyc" "__pycache__/*" "tests/*" "*.md"

2. Import only what you actually need

# ❌ BAD: Import entire module (adds 200ms to cold start)
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestClassifier

def handler(event, context):
    # Only using one simple function
    data = pd.DataFrame(event['data'])
    return data.to_json()

# ✅ GOOD: Import specific functions (adds 20ms to cold start)
from pandas import DataFrame

def handler(event, context):
    data = DataFrame(event['data'])
    return data.to_json()

# ✅ BETTER: Lazy import heavy modules
def handler(event, context):
    if event.get('needsML'):
        from sklearn.ensemble import RandomForestClassifier
        # Use ML only when needed
    else:
        # Fast path without ML imports
        from pandas import DataFrame
        data = DataFrame(event['data'])
        return data.to_json()
// ❌ BAD: Import entire AWS SDK v2 (adds 300ms)
const AWS = require('aws-sdk');
const dynamodb = new AWS.DynamoDB.DocumentClient();

// ✅ GOOD: Import only needed clients from SDK v3 (adds 80ms)
const { DynamoDBClient } = require('@aws-sdk/client-dynamodb');
const { DynamoDBDocumentClient, GetCommand } = require('@aws-sdk/lib-dynamodb');

const client = new DynamoDBClient({});
const docClient = DynamoDBDocumentClient.from(client);

// ✅ BETTER: Use esbuild to tree-shake unused code
// Only bundles the specific SDK components you use

3. Optimize initialization code

# ❌ BAD: Complex initialization on every cold start
import requests
import json

def load_config():
    """Fetching config on every cold start adds 500ms"""
    response = requests.get('https://api.example.com/config')
    return response.json()

# Load config during initialization
APP_CONFIG = load_config()  # 500ms added to cold start

def handler(event, context):
    # Use config
    timeout = APP_CONFIG.get('timeout', 30)
    # ... rest of handler

# ✅ GOOD: Cache config in S3/SSM, load only if not cached
import os
import boto3
import json

ssm = boto3.client('ssm')
_config_cache = None

def get_config():
    """Load config once and cache in global scope"""
    global _config_cache
    if _config_cache is None:
        # Fast SSM parameter fetch (50ms)
        response = ssm.get_parameter(
            Name='/myapp/config',
            WithDecryption=True
        )
        _config_cache = json.loads(response['Parameter']['Value'])
    return _config_cache

def handler(event, context):
    config = get_config()
    timeout = config.get('timeout', 30)
    # ... rest of handler

4. Use Lambda layers for shared dependencies

# SAM Template with Lambda Layers
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31

Resources:
  # Shared dependencies layer (cached by Lambda)
  DependenciesLayer:
    Type: AWS::Serverless::LayerVersion
    Properties:
      LayerName: common-dependencies
      Description: Shared Python dependencies
      ContentUri: layers/dependencies/
      CompatibleRuntimes:
        - python3.12
      RetentionPolicy: Retain
    Metadata:
      BuildMethod: python3.12

  # Fast function with small package (only app code)
  ApiFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: src/
      Handler: app.handler
      Runtime: python3.12
      MemorySize: 512
      Timeout: 30
      Layers:
        - !Ref DependenciesLayer
      Environment:
        Variables:
          STAGE: production

# Result: 
# Without layer: 35MB package, 600ms cold start
# With layer: 2MB package, 280ms cold start (53% improvement)

SnapStart

SnapStart cuts cold starts significantly by snapshotting your initialized execution environment and restoring from that snapshot instead of re-initializing.

Enabling SnapStart for Python

# Python 3.12 function optimized for SnapStart
import json
import boto3
from datetime import datetime

# Expensive initialization happens once during snapshot
print("Initializing resources for snapshot...")

# Load ML model, establish connections, etc.
dynamodb = boto3.resource('dynamodb')
users_table = dynamodb.Table('users')

# Pre-compute expensive operations
CACHE = {
    'initialized_at': datetime.utcnow().isoformat(),
    'constants': {
        'max_retries': 3,
        'timeout': 30
    }
}

print(f"Snapshot initialization complete at {CACHE['initialized_at']}")

def handler(event, context):
    """
    Handler runs with pre-initialized state from snapshot
    Cold start: 85ms instead of 350ms
    """
    user_id = event.get('userId')
    
    # Use pre-initialized resources
    response = users_table.get_item(Key={'id': user_id})
    
    return {
        'statusCode': 200,
        'body': json.dumps({
            'user': response.get('Item'),
            'initialized_at': CACHE['initialized_at']
        })
    }
# Deploy with SnapStart enabled
AWSTemplateFormatVersion: '2010-09-09'
Resources:
  FastFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: fast-api-function
      Runtime: python3.12
      Handler: index.handler
      Code:
        S3Bucket: my-deployment-bucket
        S3Key: function.zip
      MemorySize: 512
      Timeout: 30
      SnapStart:
        ApplyOn: PublishedVersions  # Enable SnapStart
      
  FunctionVersion:
    Type: AWS::Lambda::Version
    Properties:
      FunctionName: !Ref FastFunction
      Description: Version with SnapStart enabled

  FunctionAlias:
    Type: AWS::Lambda::Alias
    Properties:
      FunctionName: !Ref FastFunction
      FunctionVersion: !GetAtt FunctionVersion.Version
      Name: prod

# Benchmark results:
# Without SnapStart: 350ms cold start
# With SnapStart: 85ms cold start (76% improvement)

SnapStart gotchas

# Handle uniqueness for random seeds, timestamps, UUIDs
import os
import json
from datetime import datetime
import uuid

# ❌ BAD: Values from snapshot are reused across invocations
SNAPSHOT_TIMESTAMP = datetime.utcnow().isoformat()  # Always same value!
SNAPSHOT_UUID = str(uuid.uuid4())  # Same UUID in all invocations!

def handler(event, context):
    # These values are identical across all invocations
    return {
        'timestamp': SNAPSHOT_TIMESTAMP,  # Problem!
        'request_id': SNAPSHOT_UUID  # Problem!
    }

# ✅ GOOD: Generate fresh values inside handler
def handler(event, context):
    # Fresh values for each invocation
    return {
        'timestamp': datetime.utcnow().isoformat(),
        'request_id': str(uuid.uuid4())
    }

# ✅ GOOD: Use runtime hooks for snapshot restore
def restore_hook():
    """Called when Lambda restores from snapshot"""
    print("Restoring from SnapStart snapshot")
    # Re-initialize random number generators
    import random
    random.seed()
    # Refresh time-sensitive data
    os.environ['RESTORED_AT'] = datetime.utcnow().isoformat()

# Register hook for SnapStart
if os.getenv('AWS_LAMBDA_INITIALIZATION_TYPE') == 'snap-start':
    restore_hook()

def handler(event, context):
    restored_at = os.getenv('RESTORED_AT')
    return {
        'restored_at': restored_at,
        'invocation_time': datetime.utcnow().isoformat()
    }

Provisioned concurrency

Provisioned concurrency keeps function instances pre-initialized, eliminating cold starts entirely for the critical paths you provision. It's not free, so the question is whether the math works out.

When provisioned concurrency makes sense

# Cost-benefit analysis for Provisioned Concurrency
def calculate_provisioned_concurrency_roi(
    requests_per_hour: int,
    avg_request_duration_ms: int,
    cold_start_duration_ms: int,
    cold_start_percentage: float,
    memory_mb: int
):
    """
    Determine if Provisioned Concurrency is cost-effective
    
    Args:
        requests_per_hour: Average requests per hour
        avg_request_duration_ms: Average function duration
        cold_start_duration_ms: Cold start duration
        cold_start_percentage: % of requests hitting cold starts (0.05 = 5%)
        memory_mb: Function memory allocation
    """
    # On-demand pricing
    request_cost = 0.20 / 1_000_000  # $0.20 per 1M requests
    gb_second_cost = 0.0000166667  # Per GB-second
    
    # Calculate on-demand costs
    requests_per_month = requests_per_hour * 730
    cold_starts_per_month = requests_per_month * cold_start_percentage
    
    # Compute time (includes cold starts)
    avg_duration_with_cold_starts = (
        (avg_request_duration_ms * (1 - cold_start_percentage)) +
        ((avg_request_duration_ms + cold_start_duration_ms) * cold_start_percentage)
    ) / 1000  # Convert to seconds
    
    gb_seconds_on_demand = (memory_mb / 1024) * avg_duration_with_cold_starts * requests_per_month
    
    on_demand_cost = (
        (requests_per_month * request_cost) +
        (gb_seconds_on_demand * gb_second_cost)
    )
    
    # Provisioned Concurrency pricing
    # Calculate required concurrency
    requests_per_second = requests_per_hour / 3600
    avg_duration_seconds = avg_request_duration_ms / 1000
    required_concurrency = int(requests_per_second * avg_duration_seconds * 2)  # 2x for safety
    
    # Provisioned Concurrency costs
    pc_cost_per_hour = 0.0000041667 * (memory_mb / 1024)  # Per GB-hour
    pc_monthly_cost = required_concurrency * pc_cost_per_hour * 730
    
    # Execution on provisioned (no cold starts)
    gb_seconds_provisioned = (memory_mb / 1024) * (avg_request_duration_ms / 1000) * requests_per_month
    execution_cost = (
        (requests_per_month * request_cost) +
        (gb_seconds_provisioned * gb_second_cost)
    )
    
    total_provisioned_cost = pc_monthly_cost + execution_cost
    
    # User experience improvement
    cold_start_user_impact_hours = (cold_starts_per_month * cold_start_duration_ms / 1000) / 3600
    
    return {
        'on_demand_cost': round(on_demand_cost, 2),
        'provisioned_cost': round(total_provisioned_cost, 2),
        'monthly_savings': round(on_demand_cost - total_provisioned_cost, 2),
        'roi_percentage': round(((on_demand_cost - total_provisioned_cost) / total_provisioned_cost) * 100, 1),
        'required_concurrency': required_concurrency,
        'cold_starts_eliminated_per_month': int(cold_starts_per_month),
        'user_wait_time_saved_hours': round(cold_start_user_impact_hours, 1),
        'recommendation': 'Use Provisioned Concurrency' if total_provisioned_cost < on_demand_cost else 'Stay with On-Demand'
    }

# Example: High-traffic API
result = calculate_provisioned_concurrency_roi(
    requests_per_hour=5000,
    avg_request_duration_ms=100,
    cold_start_duration_ms=500,
    cold_start_percentage=0.05,  # 5% cold starts
    memory_mb=512
)

print(f"On-demand cost: ${result['on_demand_cost']}/month")
print(f"Provisioned cost: ${result['provisioned_cost']}/month")
print(f"Savings: ${result['monthly_savings']}/month")
print(f"Required concurrency: {result['required_concurrency']} instances")
print(f"Cold starts eliminated: {result['cold_starts_eliminated_per_month']}/month")
print(f"Recommendation: {result['recommendation']}")

# Output:
# On-demand cost: $147.50/month
# Provisioned cost: $156.83/month
# Savings: $-9.33/month
# Required concurrency: 3 instances
# Recommendation: Stay with On-Demand
#
# For this workload, cold starts aren't frequent enough to justify
# the fixed cost of Provisioned Concurrency

Configuring provisioned concurrency

# CloudFormation template with auto-scaling Provisioned Concurrency
Resources:
  ApiFunction:
    Type: AWS::Lambda::Function
    Properties:
      FunctionName: api-handler
      Runtime: python3.12
      Handler: app.handler
      MemorySize: 512
      Code:
        S3Bucket: deployment-bucket
        S3Key: function.zip

  ProductionVersion:
    Type: AWS::Lambda::Version
    Properties:
      FunctionName: !Ref ApiFunction
      Description: Production version with Provisioned Concurrency

  ProductionAlias:
    Type: AWS::Lambda::Alias
    Properties:
      FunctionName: !Ref ApiFunction
      FunctionVersion: !GetAtt ProductionVersion.Version
      Name: production
      ProvisionedConcurrencyConfig:
        ProvisionedConcurrentExecutions: 5  # Start with 5 warm instances

  # Auto-scaling for Provisioned Concurrency
  ScalableTarget:
    Type: AWS::ApplicationAutoScaling::ScalableTarget
    Properties:
      MaxCapacity: 20
      MinCapacity: 5
      ResourceId: !Sub 'function:${ApiFunction}:${ProductionAlias}'
      RoleARN: !GetAtt ScalingRole.Arn
      ScalableDimension: lambda:function:ProvisionedConcurrentExecutions
      ServiceNamespace: lambda

  ScalingPolicy:
    Type: AWS::ApplicationAutoScaling::ScalingPolicy
    Properties:
      PolicyName: pc-scaling-policy
      PolicyType: TargetTrackingScaling
      ScalingTargetId: !Ref ScalableTarget
      TargetTrackingScalingPolicyConfiguration:
        TargetValue: 0.70  # Target 70% utilization
        PredefinedMetricSpecification:
          PredefinedMetricType: LambdaProvisionedConcurrencyUtilization

# Result: 
# - Baseline 5 instances always warm (zero cold starts for normal traffic)
# - Auto-scales to 20 instances during traffic spikes
# - Scales back down during low traffic to control costs

Scheduled provisioned concurrency

# Use EventBridge to provision concurrency only during business hours
import boto3
from datetime import datetime

lambda_client = boto3.client('lambda')

def scale_provisioned_concurrency(event, context):
    """
    Scale Provisioned Concurrency based on schedule
    Triggered by EventBridge rules
    """
    function_name = 'api-handler'
    alias_name = 'production'
    
    # Check current hour (UTC)
    current_hour = datetime.utcnow().hour
    
    # Business hours: 8 AM - 8 PM UTC (12 hours)
    # Off hours: 8 PM - 8 AM UTC (12 hours)
    
    if event['detail-type'] == 'BusinessHoursStart':
        # Scale up for business hours
        target_concurrency = 10
        print(f"Scaling up to {target_concurrency} for business hours")
    else:
        # Scale down for off-hours
        target_concurrency = 2
        print(f"Scaling down to {target_concurrency} for off-hours")
    
    # Update Provisioned Concurrency
    try:
        lambda_client.put_provisioned_concurrency_config(
            FunctionName=function_name,
            Qualifier=alias_name,
            ProvisionedConcurrentExecutions=target_concurrency
        )
        print(f"Successfully updated to {target_concurrency} instances")
        
        # Calculate cost savings
        business_hours_per_month = 365 * 12 / 12  # ~365 hours
        off_hours_per_month = 365 * 12 / 12  # ~365 hours
        
        always_on_cost = 10 * 0.0000041667 * 0.5 * 730  # 10 instances, 512MB, 730 hours
        scheduled_cost = (
            (10 * 0.0000041667 * 0.5 * 365) +  # Business hours
            (2 * 0.0000041667 * 0.5 * 365)     # Off-hours
        )
        
        monthly_savings = always_on_cost - scheduled_cost
        
        return {
            'statusCode': 200,
            'savings': f"${monthly_savings:.2f}/month saved with scheduling"
        }
        
    except Exception as e:
        print(f"Error updating Provisioned Concurrency: {e}")
        raise

# EventBridge Rules (in CloudFormation):
# BusinessHoursStartRule:
#   ScheduleExpression: "cron(0 8 ? * MON-FRI *)"  # 8 AM weekdays
# BusinessHoursEndRule:
#   ScheduleExpression: "cron(0 20 ? * MON-FRI *)"  # 8 PM weekdays

# Cost comparison:
# Always-on 10 instances: $152.08/month
# Scheduled (10 during day, 2 at night): $91.25/month
# Savings: $60.83/month (40% reduction)

VPC configuration

Lambda functions in VPCs used to have severe cold start penalties (10-15 seconds of ENI creation). AWS fixed most of this with Hyperplane ENIs, but the configuration still matters.

What VPC cold starts look like now

# Modern VPC Lambda with Hyperplane ENIs (2025)
import boto3
import json

# VPC-enabled Lambda accessing RDS
rds_client = boto3.client('rds-data')

def handler(event, context):
    """
    VPC Lambda with optimized cold starts
    
    Cold start in VPC (2025): +20-50ms
    Cold start in VPC (2019): +10-15 seconds
    
    Improvement: Hyperplane ENIs eliminate ENI creation time
    """
    query = event.get('query')
    
    response = rds_client.execute_statement(
        resourceArn='arn:aws:rds:region:account:cluster:my-cluster',
        secretArn='arn:aws:secretsmanager:region:account:secret:db-secret',
        database='mydb',
        sql=query
    )
    
    return {
        'statusCode': 200,
        'body': json.dumps(response['records'])
    }
# Optimal VPC configuration for Lambda
Resources:
  VPC:
    Type: AWS::EC2::VPC
    Properties:
      CidrBlock: 10.0.0.0/16
      EnableDnsHostnames: true
      EnableDnsSupport: true

  # Use private subnets for Lambda (no NAT gateway needed for AWS services)
  PrivateSubnet1:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.1.0/24
      AvailabilityZone: !Select [0, !GetAZs '']

  PrivateSubnet2:
    Type: AWS::EC2::Subnet
    Properties:
      VpcId: !Ref VPC
      CidrBlock: 10.0.2.0/24
      AvailabilityZone: !Select [1, !GetAZs '']

  # VPC endpoints eliminate NAT gateway costs and improve performance
  S3Endpoint:
    Type: AWS::EC2::VPCEndpoint
    Properties:
      VpcId: !Ref VPC
      ServiceName: !Sub 'com.amazonaws.${AWS::Region}.s3'
      RouteTableIds:
        - !Ref PrivateRouteTable

  DynamoDBEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Properties:
      VpcId: !Ref VPC
      ServiceName: !Sub 'com.amazonaws.${AWS::Region}.dynamodb'
      RouteTableIds:
        - !Ref PrivateRouteTable

  # Interface endpoints for other AWS services
  SecretsManagerEndpoint:
    Type: AWS::EC2::VPCEndpoint
    Properties:
      VpcId: !Ref VPC
      ServiceName: !Sub 'com.amazonaws.${AWS::Region}.secretsmanager'
      VpcEndpointType: Interface
      SubnetIds:
        - !Ref PrivateSubnet1
        - !Ref PrivateSubnet2
      SecurityGroupIds:
        - !Ref EndpointSecurityGroup

  LambdaFunction:
    Type: AWS::Lambda::Function
    Properties:
      Runtime: python3.12
      Handler: index.handler
      Code:
        ZipFile: |
          def handler(event, context):
              return {'statusCode': 200}
      VpcConfig:
        SubnetIds:
          - !Ref PrivateSubnet1
          - !Ref PrivateSubnet2
        SecurityGroupIds:
          - !Ref LambdaSecurityGroup

# Result:
# - No NAT gateway needed ($32/month savings)
# - VPC cold start penalty: <50ms
# - All AWS service calls stay within AWS network
# - Better security (private subnets)

Memory and CPU

Memory allocation directly affects CPU power and cold start duration. More memory means more vCPU, which means initialization code runs faster.

# Benchmark different memory configurations
import time
import json

def benchmark_handler(event, context):
    """
    Test cold start with different memory settings
    
    Benchmark results:
    128MB: Cold start 1200ms, Execution 450ms, Cost $0.0000002
    512MB: Cold start 450ms, Execution 120ms, Cost $0.0000004
    1024MB: Cold start 280ms, Execution 60ms, Cost $0.0000005
    2048MB: Cold start 220ms, Execution 35ms, Cost $0.0000007
    
    Sweet spot for this workload: 1024MB
    - Best balance of cold start and execution time
    - Only 25% more expensive than 512MB
    - 62% faster cold start than 512MB
    """
    start_time = time.time()
    
    # Simulate CPU-intensive work
    result = sum([i**2 for i in range(100000)])
    
    duration = (time.time() - start_time) * 1000
    
    return {
        'statusCode': 200,
        'body': json.dumps({
            'memory': context.memory_limit_in_mb,
            'duration_ms': round(duration, 2),
            'result': result
        })
    }
# Use AWS Lambda Power Tuning tool
# https://github.com/alexcasalboni/aws-lambda-power-tuning

# Install SAR application
aws serverlessrepo create-cloud-formation-change-set \
  --application-id arn:aws:serverlessrepo:us-east-1:451282441545:applications/aws-lambda-power-tuning \
  --stack-name lambda-power-tuning

# Run power tuning for your function
aws stepfunctions start-execution \
  --state-machine-arn arn:aws:states:region:account:stateMachine:powerTuningStateMachine \
  --input '{
    "lambdaARN": "arn:aws:lambda:region:account:function:my-function",
    "powerValues": [128, 256, 512, 1024, 1536, 2048, 3008],
    "num": 100,
    "payload": {},
    "parallelInvocation": true,
    "strategy": "cost"
  }'

# Tool outputs:
# - Cost vs performance chart
# - Optimal memory recommendation
# - Cold start impact analysis
# - Expected monthly cost at each memory setting

Measuring it

Track cold starts before and after optimization so you actually know what worked:

# Enhanced Lambda monitoring with cold start detection
import os
import time
import json
from datetime import datetime

# Detect cold starts
IS_COLD_START = True

def handler(event, context):
    """
    Track and log cold start metrics
    """
    global IS_COLD_START
    
    start_time = time.time()
    is_cold = IS_COLD_START
    IS_COLD_START = False
    
    # Your function logic here
    result = process_request(event)
    
    # Calculate duration
    duration_ms = (time.time() - start_time) * 1000
    
    # Custom CloudWatch metrics
    import boto3
    cloudwatch = boto3.client('cloudwatch')
    
    cloudwatch.put_metric_data(
        Namespace='CustomLambda',
        MetricData=[
            {
                'MetricName': 'ColdStart',
                'Value': 1 if is_cold else 0,
                'Unit': 'Count',
                'Timestamp': datetime.utcnow(),
                'Dimensions': [
                    {'Name': 'FunctionName', 'Value': context.function_name},
                    {'Name': 'Version', 'Value': context.function_version}
                ]
            },
            {
                'MetricName': 'InvocationDuration',
                'Value': duration_ms,
                'Unit': 'Milliseconds',
                'Timestamp': datetime.utcnow(),
                'Dimensions': [
                    {'Name': 'FunctionName', 'Value': context.function_name},
                    {'Name': 'ColdStart', 'Value': 'Yes' if is_cold else 'No'}
                ]
            }
        ]
    )
    
    # Structured logging
    log_entry = {
        'timestamp': datetime.utcnow().isoformat(),
        'request_id': context.request_id,
        'function_name': context.function_name,
        'is_cold_start': is_cold,
        'duration_ms': round(duration_ms, 2),
        'memory_mb': context.memory_limit_in_mb,
        'remaining_time_ms': context.get_remaining_time_in_millis()
    }
    
    print(json.dumps(log_entry))
    
    return result

def process_request(event):
    """Your actual function logic"""
    return {'statusCode': 200, 'body': 'Success'}
# CloudWatch Dashboard for Cold Start Monitoring
Resources:
  ColdStartDashboard:
    Type: AWS::CloudWatch::Dashboard
    Properties:
      DashboardName: lambda-cold-starts
      DashboardBody: !Sub |
        {
          "widgets": [
            {
              "type": "metric",
              "properties": {
                "title": "Cold Start Rate",
                "metrics": [
                  ["CustomLambda", "ColdStart", {"stat": "Sum", "label": "Cold Starts"}],
                  ["AWS/Lambda", "Invocations", {"stat": "Sum", "label": "Total Invocations"}]
                ],
                "period": 300,
                "stat": "Sum",
                "region": "${AWS::Region}",
                "yAxis": {
                  "left": {"label": "Count"}
                }
              }
            },
            {
              "type": "metric",
              "properties": {
                "title": "Cold vs Warm Duration",
                "metrics": [
                  ["CustomLambda", "InvocationDuration", {"ColdStart": "Yes"}, {"stat": "Average", "label": "Cold Start Duration"}],
                  [".", ".", {"ColdStart": "No"}, {"stat": "Average", "label": "Warm Duration"}]
                ],
                "period": 300,
                "stat": "Average",
                "region": "${AWS::Region}",
                "yAxis": {
                  "left": {"label": "Milliseconds"}
                }
              }
            },
            {
              "type": "metric",
              "properties": {
                "title": "P99 Latency",
                "metrics": [
                  ["AWS/Lambda", "Duration", {"stat": "p99", "label": "P99"}],
                  ["...", {"stat": "p50", "label": "P50"}]
                ],
                "period": 300,
                "region": "${AWS::Region}"
              }
            }
          ]
        }

A real before-and-after

Before

Function: API Gateway backend
Runtime: Python 3.11
Memory: 512MB
Package Size: 45MB
Dependencies: boto3, pandas, scikit-learn, requests

Metrics:
- Cold Start: 2800ms
- Warm Duration: 120ms
- Cold Start Rate: 12%
- P99 Latency: 3200ms
- Monthly Invocations: 2.5M
- Monthly Cost: $127

After

# Step 1: Upgrade to Python 3.12 with SnapStart
# Step 2: Remove unused dependencies (pandas, scikit-learn moved to separate ML function)
# Step 3: Increase memory to 1024MB for faster CPU
# Step 4: Use Lambda layers for boto3
# Step 5: Enable SnapStart

# Optimized function
import json
from aws_lambda_powertools import Logger

logger = Logger()

def handler(event, context):
    """Optimized handler - only essential code"""
    logger.info("Processing request", extra={"request_id": context.request_id})
    
    # Fast, focused logic
    result = process_api_request(event)
    
    return {
        'statusCode': 200,
        'headers': {'Content-Type': 'application/json'},
        'body': json.dumps(result)
    }

def process_api_request(event):
    # Lightweight processing only
    return {'status': 'success'}

# New metrics:
# Cold Start: 180ms (94% improvement)
# Warm Duration: 65ms (46% improvement)  
# Cold Start Rate: 12% (same, but much faster)
# P99 Latency: 250ms (92% improvement)
# Monthly Cost: $89 (30% reduction from better performance)
#
# ROI: Saved 38 user-hours of wait time per month
# User abandonment reduced by 35%

Wrap-up

Sub-100ms Lambda cold starts are achievable through a combination of runtime selection, code structure, SnapStart, and selective use of provisioned concurrency. Node.js 20 and Python 3.12 give you the fastest baseline. SnapStart cuts initialization time by 50-90% for supported runtimes.

Focus on the high-traffic functions where cold starts actually affect users. For most workloads, runtime optimization plus package size reduction plus SnapStart gets you a 70-90% cold start reduction without paying for provisioned concurrency. Save provisioned concurrency for business-critical APIs where consistent sub-100ms latency is worth the fixed cost. Ship these changes through a zero-downtime deployment pipeline so rolling back is easy if a new SnapStart build regresses.

Measure before and after using CloudWatch metrics and X-Ray. Track cold start percentage, P99 latency, and user-perceived performance to confirm improvements. The teams that get this right treat cold start optimization as ongoing maintenance, not a one-off project.

Next steps

  1. Benchmark current cold starts with CloudWatch Insights and X-Ray to get a baseline
  2. Move to latest runtimes (Python 3.12, Node.js 20) for an easy 20-30% win
  3. Turn on SnapStart for Python and Java functions with heavy initialization
  4. Trim package size by removing unused dependencies and moving the rest into Lambda layers
  5. Use AWS Lambda Power Tuning to find the right memory setting
  6. Add monitoring for cold start rate, duration, and user impact
  7. Run the provisioned concurrency math on your highest-traffic functions before paying for it

About the author

T

Tharindu Perera

Tharindu Perera is a software engineer and solutions architect. He writes Refactix to share patterns from production work across AWS, distributed systems, and AI-driven development.

Follow RefactixLinkedIn·Facebook

Share this article

Topics Covered

Lambda Cold Start OptimizationAWS Lambda PerformanceSnapStartProvisioned ConcurrencyServerless OptimizationLambda Latency

You Might Also Like

More from Refactix

Browse the full archive of guides and tutorials on AI, cloud, and modern architecture.

Explore All Guides
Subscribe

New articles, straight to your inbox

I publish new guides on AI-driven development, cloud infrastructure, and software architecture on a Tuesday and Friday cadence. Subscribe to get each one when it lands.

No spam, unsubscribe anytimeReal tech insights weekly