AWS Lambda: The Complete Engineer's Guide

What's in this article

What Lambda actually is — and what it isn't
Anatomy of a function: config, limits, and packaging
Layers and environment variables
Invocation modes: synchronous, asynchronous, polling
How Lambda scales under each mode
Error handling: retries, DLQs, and destinations
How Lambda integrates with AWS services
The concurrency model in depth
Cold starts: what actually happens, real numbers, and fixes
Reserved, provisioned concurrency, and burst quota
Graviton: why you should probably switch
Use cases: Lambda@Edge, CloudFront Functions, containers
Configuration best practices

What Lambda actually is — and what it isn't

Lambda is a Function-as-a-Service (FaaS) compute platform. You give AWS a piece of code, define what triggers it, and AWS handles everything else: provisioning, scaling, patching, high availability. You do not manage servers, container hosts, or autoscaling groups. You pay for the time your code runs, measured in milliseconds.

That description sounds simple but hides important nuance. Lambda does not give you a persistent process. Each invocation is independent. State does not survive between invocations unless you explicitly put it somewhere — DynamoDB, S3, ElastiCache, Parameter Store. If your mental model is "a process that handles requests", switch it: Lambda is closer to a function call that AWS executes on your behalf, possibly on infrastructure that has never run your code before.

Lambda is not a general-purpose compute replacement. Long-running processes, stateful workloads, heavy in-memory computation, WebSocket servers that hold persistent connections — these are a poor fit. Lambda excels at short-duration, event-triggered, stateless work.

The maximum execution timeout is 15 minutes. Anything that might run longer belongs on ECS, EKS, or EC2. That 15-minute ceiling is a hard constraint, not a soft guideline — design around it from the start.

Anatomy of a function: config, limits, and packaging

Runtime and handler

Lambda supports managed runtimes for Node.js, Python, Java, .NET, Ruby, and Go. Each runtime is a versioned environment — python3.12, nodejs20.x, and so on. AWS deprecates old runtimes on a schedule; running deprecated runtimes is a security and operational risk. Pin to a current version and plan runtime upgrades like any other dependency upgrade.

The handler is the entry point: a fully qualified function reference that Lambda invokes on each execution. In Python it might be app.handler; in Java a class implementing RequestHandler. Lambda calls the handler, passes the event and context objects, and expects a return value (for synchronous invocations) or just completion (for async).

Memory and CPU

Memory is configurable from 128 MB to 10,240 MB in 1 MB increments (64 MB increments after 3008 MB). Here is the non-obvious part: CPU is not configured independently. Lambda allocates CPU proportionally to the memory setting. At 128 MB you get roughly one-eighth of a vCPU. At 1,769 MB you get exactly one full vCPU. At 3,538 MB you get two vCPUs, and so on. If your function is CPU-bound and you are not giving it enough memory, it will run slowly not because it is memory-constrained but because it has insufficient CPU.

Practical implication: For compute-heavy functions, increase memory past what the function actually needs in RAM. The extra CPU allocation often reduces execution time enough to offset any additional compute charge — profile it before assuming the default is the right setting.

Ephemeral storage

Each execution environment gets a /tmp directory. The default size is 512 MB; you can configure it up to 10 GB. This is ephemeral — it persists for the lifetime of the execution environment (which may span multiple invocations of the same instance), but you cannot rely on it being there for the next cold invocation. Use it for intermediate file processing, not as a cache you depend on being warm.

Hard limits to know

Lambda hard limits reference

Packaging options

Lambda accepts code in two forms. The traditional path is a ZIP deployment package — your code plus dependencies, uploaded directly or via S3. The limit is 50 MB compressed and 250 MB unzipped. For larger runtimes, larger dependency trees, or workloads that need a specific system library version, you have the second option: container images.

Container images let you package a Lambda function as a standard OCI-compliant Docker image up to 10 GB, hosted in Amazon ECR. The Lambda runtime interface client (RIC) runs inside the container and handles the Lambda invocation lifecycle. This means you can use a base image of your choice, include any system libraries, and test locally with the Lambda Runtime Interface Emulator (RIE). For teams already using containers everywhere, this is often the cleanest path for large Lambda workloads — it unifies your build and test pipeline.

Layers and environment variables

Lambda Layers

A Layer is a ZIP archive that Lambda extracts into the execution environment's filesystem at a well-known path (/opt) before your handler runs. Layers solve two problems: keeping your deployment package small, and sharing common dependencies across multiple functions without duplicating them.

Common candidates for layers include: shared utility libraries, large ML model weights, database drivers, data validation schemas, or monitoring agents. A function can attach up to 5 layers. The total unzipped size of function code plus all layers cannot exceed 250 MB (this limit does not apply to container image deployments).

Layer versioning matters. Layers are immutable and versioned. Attaching my-utils:3 to a function pins that function to that layer version permanently unless you explicitly update the function config. When you publish a new layer version, existing functions do not automatically pick it up — this is a feature, not a bug. It gives you stable deployments and controlled rollouts.

Layers can also be published to other AWS accounts, which makes them useful for distributing internal tooling across an organisation or via the AWS Serverless Application Repository. AWS and third-party vendors also publish public layers — the AWS Lambda Insights extension, Datadog, Dynatrace, and others are commonly distributed this way.

Environment variables

Environment variables are key-value pairs injected into the execution environment at runtime. They are the right place for configuration that changes between environments — database endpoints, feature flags, log levels, API URLs. They are the wrong place for secrets passed in plaintext.

For sensitive values, store the secret in AWS Secrets Manager or SSM Parameter Store (SecureString), and reference the ARN or parameter name in the environment variable. Retrieve the actual value at cold start in your initialisation code, outside the handler. AWS also supports encrypting environment variables at rest with a customer-managed KMS key — always use this for anything sensitive that must be in an env var directly.

Do not put secrets as plaintext in environment variables. They are visible in the Lambda console, in CloudFormation templates, in CDK outputs, and in any CI/CD logs that print function configuration. A reference to a Secrets Manager ARN is safe. The actual secret value is not.

Each environment variable key and value pair has a limit of 4 KB per variable. The total size of all environment variables for a function cannot exceed 4 KB. This ceiling is rarely hit in practice, but it matters if you are tempted to store large JSON blobs in env vars — don't.

Invocation modes: synchronous, asynchronous, polling

Lambda has three fundamentally different invocation models. Getting this wrong is the most common source of unexpected behaviour in Lambda-based architectures.

Lambda invocation modes overview

Synchronous invocation

The caller invokes Lambda and waits for the response. Lambda executes the function and returns the result synchronously. If the function errors, the error is returned directly to the caller — there is no automatic retry from Lambda's side. Retry logic is entirely the caller's responsibility. API Gateway, ALB, and direct SDK calls (InvocationType: RequestResponse) all use synchronous invocation.

Asynchronous invocation

The caller sends the event and receives an immediate 202 Accepted — Lambda has acknowledged receipt but has not necessarily executed the function yet. Lambda places the event in an internal managed queue and the function executes from there. The caller gets no result back. If there is a function error, Lambda automatically retries up to two times with delays between attempts. Events that exhaust retries can be sent to a Dead Letter Queue (SQS or SNS) or to an async event destination. S3 event notifications, SNS, EventBridge, and CodePipeline approvals all use asynchronous invocation.

Event Source Mapping (polling)

For stream and queue sources, Lambda manages the polling itself through the Event Source Mapping (ESM) — a Lambda-side poller that reads from the source and delivers batches to your function. This covers SQS, Kinesis Data Streams, DynamoDB Streams, MSK, self-managed Kafka, and Amazon MQ. The source itself does not invoke Lambda — Lambda's infrastructure reads from it on your behalf. This distinction matters for how you think about scaling, concurrency consumption, and error handling, all of which are controlled by ESM configuration rather than caller behaviour.

How Lambda scales under each mode

Lambda scaling is not the same across all invocation modes. The model differs meaningfully, and misunderstanding it leads to overrunning concurrency limits or overwhelming downstream services.

Synchronous and asynchronous: burst then linear

For synchronous invocations (API Gateway, ALB) and async invocations, Lambda starts with an initial burst quota and then scales linearly after that. The burst quota is a fixed per-region allowance — Lambda will spin up that many new execution environments immediately in the first minute of a traffic spike. After the burst is exhausted, Lambda adds 500 new concurrent executions per minute until it reaches the account's concurrency limit.

Lambda scaling behaviour — synchronous / async

This means a sudden jump from zero to very high traffic takes several minutes before Lambda reaches steady-state concurrency. If your downstream — a database, a third-party API — cannot handle a burst at the burst quota rate, Lambda's scaling can be the thing that takes it down. Use reserved concurrency (discussed below) to cap the maximum, or use SQS as a buffer to smooth the arrival rate.

SQS: scaling is driven by queue depth

For standard SQS queues with ESM, Lambda scales by adding more concurrent function executions as the queue depth grows. It starts by polling with a few concurrent executions and ramps up quickly — it can reach up to 1,000 concurrent invocations per queue. Importantly, Lambda scales up more aggressively than it scales down: it may maintain concurrency even as the queue drains, so you can see function concurrency remain elevated for a short period after the queue empties. The scale-down lag is usually a few minutes.

Kinesis and DynamoDB Streams: shard-bounded

For stream sources, the maximum concurrency is bounded by the number of shards. By default, Lambda runs one concurrent invocation per shard. With the parallelisation factor setting (Kinesis only), you can run up to 10 concurrent invocations per shard, allowing faster throughput without resharding. DynamoDB Streams are always one-per-shard. Shards are the scaling unit — to scale Lambda throughput on streams, you scale the stream's shard count.

Error handling: retries, DLQs, and destinations

Synchronous invocations

Lambda returns the error response directly to the caller. No automatic retry. The caller decides what to do. If the caller is API Gateway, it can be configured to return specific HTTP status codes for Lambda errors. For client-initiated SDK calls, the SDK will retry on throttling errors (TooManyRequestsException) with exponential backoff, but not on function-level errors thrown by your code.

Asynchronous invocations

Lambda retries failed async invocations up to two additional times (three total attempts), with delays between them — first after about 1 minute, second after about 2 minutes. If all three attempts fail, the event is either discarded or sent to the configured failure destination. The retry delay means async functions need to be idempotent — the same event may arrive multiple times.

You have two mechanisms for capturing failed events:

Dead Letter Queue (DLQ): an SQS queue or SNS topic. Lambda sends the event payload to the DLQ after exhausting retries. Older mechanism; limited metadata.
Event destinations: the newer and more capable option. You can configure separate destinations for success and failure, pointing to SQS, SNS, EventBridge, or another Lambda function. Destinations include richer metadata — the original event, the invocation record, and error details — making them much more useful for debugging and reprocessing.

Always configure a failure destination or DLQ for async-invoked functions. Without one, failed events after retries are silently discarded. You will have no record they ever arrived, and debugging why records went missing becomes very difficult after the fact.

Event Source Mapping error handling

ESM error handling is richer and deserves its own attention. When a batch fails, the entire batch is retried by default. This creates the poison-pill problem: one bad record in a batch of 100 blocks all 100 records from making progress indefinitely.

The solution is to configure the ESM with:

BisectBatchOnFunctionError: when the batch fails, Lambda splits it in half and retries each half separately. This isolates the bad record to a progressively smaller batch until it is alone, at which point it can be sent to a DLQ.
MaximumRetryAttempts: limits how many times a batch is retried before being sent to the failure destination.
DestinationConfig: configures where failed records go after retries are exhausted.
ReportBatchItemFailures: instead of the function failing the entire batch, it returns a partial failure response listing which message IDs failed. Lambda retries only those specific records. This is the cleanest approach and eliminates reprocessing of successfully handled records.

Use ReportBatchItemFailures for SQS wherever possible. It gives you the granularity of per-message retry without the overhead of batch-size-1, and it prevents the whole batch from being blocked by a single bad record.

How Lambda integrates with AWS services

Lambda is the glue layer of the AWS ecosystem. Almost every AWS service can trigger it, but the invocation model varies.

🌐

API Gateway / ALB

Synchronous. Caller blocks for response. 29-second timeout enforced by API Gateway regardless of Lambda's own timeout setting. Best for REST/HTTP APIs where a response is required.

🪣

S3

Asynchronous. S3 fires event notifications and Lambda retries on failure. For event-driven object processing — thumbnails, ETL, virus scan on upload. Be mindful of recursive patterns (function writing back to same bucket).

📬

SQS

ESM / polling. Lambda polls the queue, processes in batches. Standard queues scale aggressively; FIFO queues are limited to one concurrent function per message group. Configure batch size, window, and concurrency limit on the ESM.

📡

SNS

Asynchronous. SNS delivers to Lambda as an async invocation — built-in retries, DLQ support. Often paired with SQS in a fan-out pattern to decouple retries from processing.

🗄️

DynamoDB Streams

ESM / polling. Ordered, shard-based. One concurrent Lambda invocation per shard by default. Enables change-data-capture patterns, cross-region replication, and audit trails.

⚡

EventBridge

Asynchronous. Rules match events on the default or custom bus and invoke Lambda. The standard decoupling mechanism for microservices. Supports both scheduled rules (cron) and event pattern matching.

🔁

Kinesis

ESM / polling. Ordered per shard. Enhanced fan-out available. Bisect-on-error helps with poison-pill records. Parallelisation factor (up to 10) allows multiple concurrent invocations per shard.

🔐

Cognito / SES / CloudWatch Logs

Synchronous (Cognito triggers, SES receipt rules) or async (CloudWatch Logs subscription filter). Used for custom auth flows, email filtering, and log processing pipelines.

The concurrency model in depth

Lambda's concurrency model is the most important thing to understand to operate it well. Every other topic — cold starts, throttling, provisioned concurrency — flows from it.

Concurrency is the number of function instances handling requests at any moment. Each simultaneous in-flight invocation requires its own execution environment. Two concurrent requests cannot share an execution environment; they each get their own. Lambda creates new execution environments as demand grows and retains them briefly after invocations complete — this is the execution environment reuse that people refer to as a "warm" instance.

Lambda execution environment lifecycle

The execution environment goes through three phases every time Lambda needs to create a new one:

Init phase: Lambda provisions the environment, downloads your code or layer, initialises the runtime, and runs your initialisation code — everything outside the handler function. This is the "cold start" overhead.
Invoke phase: Lambda calls your handler. This happens on every invocation, warm or cold.
Shutdown phase: Lambda freezes the environment after the invocation completes. It may keep it for a future invocation (warm start) or permanently reclaim it after a period of inactivity.

The account-level regional concurrency limit is a pool shared across all functions in a region. The default is 1,000. This is a soft limit — you can request increases through Service Quotas. When a function is throttled (because the limit is hit), synchronous invocations receive a 429 TooManyRequestsException. Async invocations are queued and retried. ESM invocations are held at the source.

Cold starts: what actually happens, real numbers, and fixes

A cold start occurs every time Lambda needs to create a new execution environment. It is not a bug or a failure — it is the price of the on-demand scaling model. But it has real latency consequences, and understanding what contributes to it is the first step to managing it.

What happens during a cold start

Cold start phase breakdown

Real numbers

Lambda cold start durations vary significantly by runtime, package size, and what your init code does. As rough reference points based on commonly observed production data:

Python and Node.js: typically 100 – 300 ms for small, lean functions with minimal init work.
Java and .NET: typically 1 – 4 seconds for standard JVM/CLR startup. The JVM's class loading and JIT warm-up dominate. SnapStart (Java 11+) addresses this — more below.
Go: 50 – 150 ms. Go compiles to a native binary; there is no runtime VM startup overhead.
Container images: adds image pull latency on the first invocation, which can be 1 – 5 seconds depending on image size. Subsequent cold starts on pre-pulled images are comparable to ZIP-based deployments.

These numbers grow fast if your init code is slow. Connecting to a database, loading a large ML model, reading a large config file from S3 — all of these happen in the init phase for every new execution environment. A 2-second database connection attempt during init means a 2-second overhead on every cold start, on top of the runtime startup cost.

Solutions

Keep your deployment package small. Smaller package = faster download = shorter cold start. Remove unused dependencies, use tree-shaking in Node.js, use Docker multi-stage builds to strip dev dependencies. Every megabyte matters at scale.

Minimise init code work. Do only what is necessary outside the handler — create SDK clients, initialise connection pools, load environment config. Do not make network calls to non-essential services, do not load data that could be fetched lazily on first use.

Use lazy initialisation for rarely-used paths. If your function has a code path that touches a resource needed only occasionally, defer that initialisation to within the handler, behind a check, rather than doing it unconditionally at cold start.

Lambda SnapStart (Java). SnapStart for Java 11 and later takes a snapshot of the initialised execution environment after the init phase and restores from that snapshot on subsequent cold starts. This can reduce Java cold start times from 1–4 seconds down to under 200 ms. It requires enabling SnapStart on the function version and using the CacheScope annotation for any state that should not be restored from snapshot (randomness sources, connections that should not be reused from a snapshot).

Choose a leaner runtime. If cold start latency is critical and you have runtime flexibility, Python and Node.js consistently outperform Java and .NET on cold start. Go is the fastest managed runtime. Graviton also reduces cold start time marginally — discussed later in the Graviton section.

Reserved, provisioned concurrency, and burst quota

Reserved concurrency

Reserved concurrency is a cap and a guarantee set at the function level. Setting reserved concurrency of 100 on a function means:

That function can never exceed 100 concurrent executions — it will throttle beyond that.
Those 100 units are carved out of the regional pool exclusively for this function — other functions cannot consume them even if idle.

Use reserved concurrency to protect downstream resources. If your Lambda talks to a database that can handle 80 connections, set reserved concurrency to 80 (or lower, accounting for connection pool size per instance). Without this cap, Lambda can scale to hundreds of concurrent executions and exhaust the database connection pool.

Setting reserved concurrency to zero effectively disables the function — useful for emergency cutoffs without deleting the function.

Reserved concurrency reduces the pool available to all other functions in the region. If you reserve 800 of 1,000 for a single function, every other function in the region is competing for the remaining 200. Reserve thoughtfully.

Provisioned concurrency

Provisioned concurrency solves cold starts by pre-initialising a specified number of execution environments and keeping them warm and ready to handle requests. Unlike reserved concurrency, it is not just a cap — it actively costs you, because Lambda is maintaining idle environments.

When a request arrives, provisioned environments respond immediately with no cold start. If demand exceeds the provisioned amount, Lambda spins up additional on-demand environments normally (with potential cold starts). Provisioned concurrency is configured on a function version or alias, not on the $LATEST version. This integrates naturally with deployment strategies — you provision on the stable alias, not on the in-development latest version.

Reserved vs Provisioned concurrency — what each controls

Application Auto Scaling can manage provisioned concurrency automatically — scaling it up before a scheduled traffic spike (a product launch, a nightly batch job) and scaling it down after. You define scaling policies and target tracking rules, the same way you would for other AWS resources.

Burst quota

The burst quota is the maximum number of execution environments Lambda will spin up simultaneously in response to a rapid increase in traffic, within the first minute. It varies by region — us-east-1, us-west-2, and eu-west-1 have higher burst quotas (3,000) than most other regions (500 or 1,000). After the burst quota is consumed, Lambda adds 500 concurrent executions per minute until the regional limit is reached.

The burst quota is a regional constraint, not a per-function one. A single function can consume the entire burst quota if it is the only one seeing traffic. This is why gradual traffic shaping — canary deployments, weighted routing, traffic ramping on aliases — is important for functions that need to scale from near-zero to high traffic quickly.

Graviton: why you should probably switch

Lambda supports two processor architectures: x86_64 and arm64 (AWS Graviton2). When you create a new function, the default is x86_64. You should evaluate switching to arm64 for most workloads.

Why Graviton?

Graviton2 processors deliver better performance per unit of compute for most workload types. In Lambda specifically, this translates to:

Lower per-invocation compute cost for the same memory configuration — Graviton compute is priced lower than x86_64 compute at equivalent memory settings.
Marginally faster cold starts in many runtimes, because the Graviton environments tend to have slightly faster init times.
Better performance for CPU-bound workloads — data processing, image manipulation, compression — where Graviton's architecture efficiency shows clearly.

For interpreted runtimes (Python, Node.js, Ruby), switching to Graviton is almost always a pure win — lower cost, equivalent or better performance, no code changes needed. For compiled languages (Java, .NET, Go), you need to rebuild your binaries for arm64, but the effort is minimal with modern build tooling.

What to check before switching

The main consideration is native binary dependencies. If your Lambda code or layers include pre-compiled C extensions (common in Python scientific computing — NumPy, Pandas, etc.), or native Node.js addons, or any binary compiled for x86_64, those will not run on arm64 without recompilation. Check your dependency tree. Most major libraries publish arm64-compatible wheels and binaries, but verify before switching production workloads.

Container images also need to be built for linux/arm64. Use multi-platform builds with Docker Buildx or your CI pipeline's ARM runner support. Lambda's architecture setting must match the container image architecture.

Use cases: Lambda@Edge, CloudFront Functions, and containers

Lambda@Edge

Lambda@Edge runs Lambda functions at CloudFront edge locations — the same globally distributed PoPs that serve your cached content. Instead of a round-trip to your origin for logic, the function executes at the edge closest to the user. It integrates at four points in the CloudFront request/response lifecycle:

Viewer request: after CloudFront receives a request from a viewer, before checking the cache.
Origin request: before CloudFront forwards the request to the origin.
Origin response: after CloudFront receives a response from the origin, before caching.
Viewer response: before CloudFront returns the response to the viewer.

Lambda@Edge is the right tool when you need substantial compute at the edge: A/B testing with personalised responses, authentication and authorisation checks, dynamic origin selection, request/response rewriting, geolocation-based redirects, or server-side rendering partial content at the edge. Functions run in the us-east-1 region for configuration but replicate to all CloudFront edges automatically.

Lambda@Edge has tighter constraints than standard Lambda. Maximum execution timeout is 5 seconds for viewer-facing events and 30 seconds for origin-facing events. Memory is capped at 128 MB (viewer) or 10,240 MB (origin). No environment variables — use SSM or code-level config. No VPC support. No ARM — x86_64 only. These are not soft limits.

CloudFront Functions

CloudFront Functions is a separate, lighter execution environment designed for ultra-low-latency manipulation of CloudFront viewer requests and responses. It runs JavaScript (a restricted subset — no Node.js APIs, no network access, no file I/O) at sub-millisecond execution time. It is significantly cheaper than Lambda@Edge per invocation and has no cold start in the traditional sense.

CloudFront Functions vs Lambda@Edge is a capability tradeoff:

⚡

CloudFront Functions — use when

You need simple, fast transformations: URL rewrites and redirects, header normalisation, cache key manipulation, simple auth token validation, A/B cookie assignment. Sub-millisecond execution, no cold start, very low cost.

🧠

Lambda@Edge — use when

You need real compute: database lookups, third-party API calls, complex auth (JWT with JWKS validation), origin selection based on business logic, SSR, image transformation. Full Lambda capability with edge proximity.

Container image packaging

Packaging Lambda functions as container images (OCI format, hosted in ECR) is not just about getting around the 250 MB ZIP limit. It enables a genuinely different development and deployment workflow:

Unified build pipeline: the same Dockerfile, the same image, the same CI process for Lambda functions and ECS/EKS services. No special Lambda packaging step.
Local testing parity: use the Lambda Runtime Interface Emulator (RIE) to run your container locally with near-identical behaviour to production. No mocking of the Lambda runtime.
Large dependencies: ML model weights, large binary dependencies, custom system libraries — all packaged inside the image without worrying about ZIP limits or layer size constraints.
Reproducible environments: the image is the immutable unit of deployment. No "it works on my machine" layer version drift.

The tradeoff is cold start. Container image cold starts include an image pull step from ECR, which adds latency on the first invocation of a new execution environment. AWS caches images at the Lambda fleet level after the first pull, so subsequent cold starts on the same underlying infrastructure are faster — but you cannot control or guarantee this caching. For latency-sensitive functions, provisioned concurrency eliminates this concern entirely.

Configuration best practices

Right-size memory — and measure it

The default of 128 MB is almost always wrong for production functions. Start by profiling: run your function at several memory settings and measure both execution time and the memory actually consumed. AWS Lambda Power Tuning (an open-source Step Functions state machine) automates this — it runs your function at every memory setting from 128 MB to 10,240 MB and plots the cost-performance curve. The optimal setting is rarely the maximum, but it is almost never the minimum either.

Set timeouts correctly

The default Lambda timeout is 3 seconds. That is too short for many workloads and silently causes failures when functions hit it. Set the timeout to a value that reflects the maximum expected execution time of your function with a reasonable margin — not the maximum 15 minutes. A function that usually runs in 2 seconds should have a timeout of perhaps 10–15 seconds, not 900. This ensures fast failure detection when something is wrong, rather than hanging for 15 minutes before timing out.

Separate init code from handler code

Anything that can be computed once and reused across invocations should live outside the handler. SDK clients, database connection pools, configuration objects, compiled regex patterns — initialise once, reuse many times. This reduces per-invocation latency and amortises initialisation cost across all warm invocations on that execution environment.

Execution environment reuse is a feature, not an implementation detail. Write code that assumes reuse (initialise once, handle the case where a resource needs reconnecting) rather than code that assumes a fresh environment on every invocation.

Use function URLs or API Gateway — not both

Lambda Function URLs provide a simple HTTPS endpoint directly on the function, with no API Gateway configuration. They support IAM auth or no auth (public). For simple use cases — a webhook receiver, an internal tool endpoint — they are easier to configure and operationally simpler. Use API Gateway when you need its features: request validation, usage plans, WAF integration, custom domains with path routing, caching, or response transformations across many functions.

IAM execution roles: least privilege

Every Lambda function needs an execution role. The common mistake is to attach AdministratorAccess or overly broad managed policies "to get it working quickly" and never revisit. Scope execution roles tightly: the function should have permission only to the specific resources it needs — the exact DynamoDB table, the exact S3 bucket prefix, the exact SSM parameters. Use IAM condition keys to restrict access by resource tag, ARN prefix, or request context where available.

VPC configuration — use it only when you need it

Placing Lambda in a VPC is required to access resources in the VPC — RDS in a private subnet, ElastiCache, internal services. But it adds cold start latency (Lambda must provision a network interface) and adds operational complexity. If your function talks only to AWS services with public endpoints — DynamoDB, S3, SQS, API Gateway — it does not need VPC access. When you do use VPC, use at least two private subnets in different AZs, allocate sufficient ENI capacity, and ensure the subnets have enough free IP addresses for Lambda's scaling needs.

Observability: structured logs, metrics, and tracing

Lambda automatically sends logs to CloudWatch Logs, but raw print statements or unstructured log lines are hard to query and alert on. Log in structured JSON format — include correlation IDs, function version, request context, and structured error objects. Enable AWS X-Ray tracing (or OpenTelemetry) to get distributed traces across Lambda invocations and downstream service calls. Use Lambda Insights (a CloudWatch agent layer) for enhanced metrics: memory utilisation, init duration, and cold start tracking that the default metrics do not expose.

Use aliases and weighted traffic for deployments

Never invoke $LATEST in production. Publish versions, and route traffic through an alias. Aliases support weighted routing — you can direct 5% of traffic to the new version and 95% to the stable version, monitor error rates and latency, and shift weight incrementally or roll back with a single API call. Combined with CodeDeploy hooks, this gives you canary and linear deployments with automatic rollback on alarm triggers.

Quick reference: the configuration checklist

🏗️

At function creation

Right-size memory (profile it). Set a real timeout. Choose arm64 unless there's a reason not to. Attach a tight IAM execution role. Configure structured logging.

🔐

Secrets and config

No plaintext secrets in env vars. Reference Secrets Manager or SSM. Encrypt env vars with a CMK. Keep config small — 4 KB limit.

⚠️

Error handling

Always configure DLQ or async event destinations. Enable BisectBatchOnFunctionError on ESM. Use ReportBatchItemFailures for SQS. Test your failure paths.

📈

Concurrency

Set reserved concurrency to protect downstream systems. Use provisioned concurrency for latency-sensitive workloads. Monitor throttles. Plan for burst quota limits.

🚀

Deployments

Publish versions. Use aliases in production. Use weighted routing for canary releases. Automate rollback with CloudWatch alarms and CodeDeploy hooks.

🔭

Observability

Structured JSON logs. X-Ray or OTel tracing. Lambda Insights for memory and init metrics. Alarm on error rate, throttle count, and p99 duration.