A confession before the procedure: I cannot re-run any of this on the box that builds this site.
Most of what we publish here gets executed in a sandbox before it ships — that’s
the rule. This post breaks it on purpose, because the subject is AWS, and AWS is
not a thing you stand up in a mktemp directory between two CI steps. Every
command below that starts with zappa or touches rds, secretsmanager, or a
vpc_config block was run against a real AWS account at the time — and was not
re-run here. I have flagged the unverifiable steps inline. Treat the config
blocks as a map, not as captured output. Where I show a shell command, assume it
is the shape of the call, not a transcript I produced today.
With that out of the way: here is the actual lesson, which is older and meaner than any single AWS service.
The mismatch nobody warns you about
Django is built around a database connection it expects to keep. It opens one, holds it across requests if you let it, and assumes the process it lives in will be around for a while.
Lambda disagrees with every word of that sentence. A Lambda function is
stateless and ephemeral. It can spin up a hundred copies of itself under load,
each a fresh process that wants its own database connection, and then throw them
all away minutes later. Point that at a normal Postgres instance and you don’t
get a scaling story — you get a FATAL: too many connections and a database
that fell over because your traffic spike turned into a connection spike.
That is the whole problem in one sentence: Django wants one durable connection; Lambda gives it many disposable ones. Everything below is paying down that mismatch.
The specific ways it bites:
- Connection limits. Relational databases cap concurrent connections. Lambda’s horizontal scaling blows straight through that cap.
- Statelessness. A function starts fresh each invocation. There’s no process-lifetime connection pool to reuse, because there’s no stable process.
- Cold starts. Opening a database connection on a cold invocation adds latency to the very requests that already felt slow.
- Networking. Put the database in a VPC and your Lambda needs VPC configuration to reach it — which, in 2019, made cold starts noticeably worse.
Picking a database (the trade you’re actually making)
Amazon RDS is the obvious choice: managed Postgres/MySQL, speaks Django ORM, nothing exotic. The catch is the fixed connection cap and the fact that you pay for the instance whether or not anything is invoking it. RDS is the right answer if you accept that you’ll need a proxy in front of it.
Aurora Serverless scales capacity with load and bills for what you use, and its Data API talks over HTTPS — which dodges the persistent-connection problem entirely by not holding a connection. The price is latency during scaling events. (Aurora Serverless v2 is the version to look at; this was written when v1 was the reality, and “scaling event latency” was not a footnote, it was a feature you scheduled around.)
DynamoDB has no connection-limit problem because it has no connections in the relational sense. It also has no Django ORM, which means rewriting your data layer. That’s not a database choice; that’s an architecture choice wearing a database choice’s clothes. Worth it for some workloads, a trap if you reach for it just to escape connection math.
The fix for the connection storm: RDS Proxy
This is the piece that makes the whole thing viable. RDS Proxy sits between your fleet of short-lived Lambdas and your one long-suffering database, holds a pool of real connections, and hands them out and reclaims them as invocations come and go. Your hundred Lambdas talk to the proxy; the proxy talks to the database with a sane, bounded number of connections.
The setup, in order — and none of these console steps were re-run for this post, so verify against current AWS docs:
- Create an RDS Proxy in the RDS console and associate it with your database.
- Give the Lambda’s execution role permission to use the proxy.
- Point Django at the proxy endpoint instead of the database endpoint:
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql",
"NAME": "your_db_name",
"USER": "your_db_user",
"PASSWORD": "your_db_password",
"HOST": "your_rds_proxy_endpoint", # the proxy, not the DB
"PORT": "5432",
}
}
That HOST line is the entire trick: Django thinks it’s talking to a database;
it’s talking to a connection pool that lies convincingly.
Tune Django’s side too
CONN_MAX_AGE decides how long Django holds a connection between requests. The
intuition from a long-lived server (“keep it open, reuse it”) is exactly wrong
in a fleet of disposable processes that each open their own:
DATABASES["default"]["CONN_MAX_AGE"] = 60 # seconds; lower = fewer idle conns
Set it to 0 and Django closes the connection after every request — fewest idle
connections, most per-request overhead. With RDS Proxy doing the pooling, you can
afford a low number here because the expensive part isn’t yours anymore.
If you can’t or won’t run RDS Proxy, you’re into manual pooling territory —
psycopg2.pool behind a custom backend, or packages like django-db-geventpool.
I’m flagging these as the path I’d avoid: it’s more code you own, doing a job AWS
will rent you. The proxy exists precisely so you don’t write this.
VPC: where cold starts go to die
If the database lives in a VPC — and it should — the Lambda needs to join that network:
- Set the Lambda’s VPC, subnets, and security groups in its config.
- Open the paths: the Lambda’s security group needs outbound to the DB; the DB’s security group needs inbound from the Lambda’s group.
Two warnings I’d underline in red:
- Subnets. Use private subnets with a NAT Gateway if the function needs the internet. Putting the database reachable from public subnets is how you accidentally expose it. This is a data-exposure footgun, not a performance tip.
- Cold starts. In this era, attaching a Lambda to a VPC added real cold-start latency while it provisioned an elastic network interface. VPC endpoints helped. This was the single most surprising tax on the whole setup, and it’s the thing I cannot re-measure for you here.
Credentials: stop hardcoding them
Don’t ship database passwords in your settings file. Put them in AWS Secrets
Manager and fetch them at runtime. Here’s the shape of it — not run here,
and note it needs an import json the original snippet quietly omitted:
import json
import boto3
from botocore.exceptions import ClientError
def get_secret(secret_name, region_name):
client = boto3.session.Session().client(
service_name="secretsmanager",
region_name=region_name,
)
try:
response = client.get_secret_value(SecretId=secret_name)
except ClientError:
raise # let it fail loudly; a silent secrets failure is worse
return json.loads(response["SecretString"])
secrets = get_secret("your_secret_name", "your_region")
DATABASES = {
"default": {
"ENGINE": "django.db.backends.postgresql",
"NAME": secrets["DB_NAME"],
"USER": secrets["DB_USER"],
"PASSWORD": secrets["DB_PASSWORD"],
"HOST": secrets["DB_HOST"],
"PORT": secrets["DB_PORT"],
}
}
Secrets Manager can also rotate credentials for you. The more disciplined option is RDS IAM authentication — no static password at all, the Lambda role generates a short-lived token — but that’s a bigger setup and I’m not going to pretend I verified it from here.
Static files, settings, and the rest of the serverless tax
A Lambda’s filesystem is read-only and disposable, so Django’s static and media
files go to S3 via django-storages:
INSTALLED_APPS += ["storages"]
AWS_STORAGE_BUCKET_NAME = "your_bucket_name"
AWS_S3_REGION_NAME = "your_region"
# credentials come from the Lambda's IAM role, not hardcoded keys
STATICFILES_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"
DEFAULT_FILE_STORAGE = "storages.backends.s3boto3.S3Boto3Storage"
(The source I rewrote this from hardcoded AWS_ACCESS_KEY_ID and
AWS_SECRET_ACCESS_KEY here. Don’t. The function already has a role; let it use
it.)
Read non-secret config from environment variables, and keep the deployment package small — every megabyte is cold-start latency:
import os
DEBUG = os.getenv("DEBUG", "False") == "True"
Migrations: the step that ruins your day if you forget it
Migrations are the part where serverless stops being convenient. There is no
“the server is up, run migrate on it” — there is no server. Two honest options:
- Run migrations from CI/CD (or your laptop) before the new code goes live.
- Or invoke a dedicated migration Lambda.
Either way the ordering rule is absolute: migrate before the application code goes live, or you’ll serve requests against a schema that doesn’t match the code expecting it. That’s not a crash; it’s worse — it’s intermittent, data-shaped wrongness.
Deploying with Zappa
Zappa is the tool that packages a Django/WSGI app into a Lambda + API Gateway and
deploys it. The workflow, with the standing reminder that these zappa
commands hit real AWS and were not re-run for this post:
pip install zappa django psycopg2-binary "django-storages[boto3]"
zappa init # answers go into zappa_settings.json: region, S3 bucket, etc.
The VPC config has to make it into zappa_settings.json, or your Lambda deploys
fine and then can’t reach the database — a failure that looks like a bug and is
actually a config gap:
{
"production": {
"django_settings": "your_project.settings",
"vpc_config": {
"SubnetIds": ["subnet-xxxxxxxx", "subnet-yyyyyyyy"],
"SecurityGroupIds": ["sg-zzzzzzzz"]
},
"environment_variables": {
"DJANGO_SETTINGS_MODULE": "your_project.settings"
}
}
}
Then deploy, and run migrations through the deployed function:
zappa deploy production
zappa manage production migrate
zappa deploy packages the app, uploads it to S3, and wires up the AWS
resources. zappa manage production migrate runs Django’s migrate inside the
Lambda environment — which is the closest this architecture gets to ssh-ing into
a server, and it’s a remote invocation, not a shell.
What I’d actually tell you
Serverless Django works. The connection problem is real and RDS Proxy solves it; the VPC cold-start tax is real and you should measure it for your own traffic before committing. But the load-bearing question isn’t “can I deploy Django to Lambda” — it’s “does my workload actually want this.” Spiky, bursty, mostly-idle traffic: serverless earns its keep. Steady, connection-heavy, long-request traffic: a container on Fargate or App Runner keeps its database connections alive without you fighting the platform, and you’ll spend less of your life thinking about connection pools.
And the honest caveat that frames this whole post: I wrote down the procedure and
the reasoning, but I could not verify any of the AWS-touching steps from the
machine that built this page. AWS APIs, service limits, and Aurora Serverless in
particular have moved a lot since this was first run in 2019. Use this for the
shape of the solution — the Django-versus-Lambda mismatch and how RDS Proxy
absorbs it — and check every endpoint, IAM action, and zappa flag against
current docs before you trust it in production.