Deploy an AI Agent to Production: Docker, AWS, and the Boring Bits
A working agent on your laptop is not a deployed agent. The gap is mostly boring: a Dockerfile, secrets, a database, autoscaling, observability. Done well, it is one afternoon. Done sloppily, it is a series of 3am pages.
Here is the path from compiled graph to production on AWS. Docker image, ECS Fargate, RDS, secrets, scaling.
The shape of a production deployment
For a Python agent service on AWS, the minimum viable architecture is:
ALB → ECS Fargate (your agent) → RDS Postgres (checkpointer)
→ ElastiCache Redis (hot path)
→ Secrets Manager (API keys)
→ CloudWatch (logs)
Each box is one or two days of work. The agent itself is the small part.
Step 1: Dockerfile
A small, reproducible image:
FROM python:3.12-slim
# Reduce image size, avoid pyc files baked in by package install
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
WORKDIR /app
# Install system deps if you need them (e.g. for asyncpg)
RUN apt-get update && apt-get install -y --no-install-recommends \
libpq-dev gcc \
&& rm -rf /var/lib/apt/lists/*
# Install Python deps first for layer caching
COPY pyproject.toml uv.lock ./
RUN pip install uv && uv pip install --system --no-cache-dir -e .
# Copy app code last
COPY . .
# AgentFlow CLI
RUN pip install --no-cache-dir 10xscale-agentflow-cli
EXPOSE 8000
# Use exec form so signals reach the process
CMD ["agentflow", "api", "--host", "0.0.0.0", "--port", "8000"]
If you prefer to skip the Dockerfile, AgentFlow's CLI can scaffold one:
agentflow init
agentflow generate-docker
See Generate Docker files for the generated content.
Step 2: Local docker-compose for development
version: "3.9"
services:
agent:
build: .
ports: ["8000:8000"]
environment:
DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/agentflow
REDIS_URL: redis://redis:6379/0
OPENAI_API_KEY: ${OPENAI_API_KEY}
GOOGLE_API_KEY: ${GOOGLE_API_KEY}
depends_on: [db, redis]
db:
image: postgres:16-alpine
environment:
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_DB: agentflow
ports: ["5432:5432"]
volumes:
- pgdata:/var/lib/postgresql/data
redis:
image: redis:7-alpine
ports: ["6379:6379"]
volumes:
pgdata:
Run docker compose up, send a request to localhost:8000. If this works, ECS will work. Only the configuration changes.
Step 3: Build and push to ECR
aws ecr create-repository --repository-name agentflow-app
aws ecr get-login-password --region us-east-1 \
| docker login --username AWS --password-stdin <account>.dkr.ecr.us-east-1.amazonaws.com
docker build -t agentflow-app:latest .
docker tag agentflow-app:latest <account>.dkr.ecr.us-east-1.amazonaws.com/agentflow-app:latest
docker push <account>.dkr.ecr.us-east-1.amazonaws.com/agentflow-app:latest
For CI, use the official aws-actions/amazon-ecr-login GitHub Action.
Step 4: RDS Postgres for the checkpointer
Create the smallest instance that fits your throughput. For a starting point: db.t4g.medium, single-AZ, 50 GB storage.
aws rds create-db-instance \
--db-instance-identifier agentflow-prod \
--db-instance-class db.t4g.medium \
--engine postgres \
--master-username agentflow \
--master-user-password 'use-secrets-manager-instead' \
--allocated-storage 50 \
--vpc-security-group-ids sg-xxx \
--db-subnet-group-name your-private-subnet-group
Connection string format: postgresql+asyncpg://agentflow:<password>@<host>:5432/agentflow. Pass via Secrets Manager (next step), not env vars in the task definition.
Step 5: ElastiCache Redis for hot-path state
cache.t4g.small works for most production loads. Single-shard, no clustering needed initially.
Connection string: redis://<host>:6379/0.
Step 6: Secrets Manager
Never put OPENAI_API_KEY in plaintext in your task definition. Create a secret per credential:
aws secretsmanager create-secret \
--name agentflow/prod/openai \
--secret-string '{"OPENAI_API_KEY":"sk-..."}'
Reference it in the ECS task definition's secrets block, not environment.
Step 7: ECS Fargate task definition
{
"family": "agentflow-app",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "1024",
"memory": "2048",
"containerDefinitions": [
{
"name": "agent",
"image": "<account>.dkr.ecr.us-east-1.amazonaws.com/agentflow-app:latest",
"portMappings": [{"containerPort": 8000, "protocol": "tcp"}],
"environment": [
{"name": "DATABASE_URL", "value": "postgresql+asyncpg://agentflow:_/agentflow.cluster-xxx.us-east-1.rds.amazonaws.com:5432/agentflow"},
{"name": "REDIS_URL", "value": "redis://agentflow.xxx.cache.amazonaws.com:6379/0"}
],
"secrets": [
{"name": "OPENAI_API_KEY", "valueFrom": "arn:aws:secretsmanager:...:secret:agentflow/prod/openai:OPENAI_API_KEY::"}
],
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 30
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/agentflow-app",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "agent"
}
}
}
]
}
Notes:
- 1 vCPU + 2 GB is plenty for an I/O-bound agent service. Add CPU only if you have CPU-bound tools.
- Health checks hit
/health(AgentFlow exposes this by default). - Logs flow to CloudWatch; route from there to your log platform.
Step 8: ALB and SSL
Put the service behind an Application Load Balancer with HTTPS:
- Target group: HTTP 8000, health check on
/health - Listener: HTTPS 443 with an ACM certificate
- Idle timeout: 300 seconds. Important for SSE streams
If your idle timeout is too low (default is 60s), long agent streams get cut. Tune it.
Step 9: Autoscaling
For an agent service, scale on CPU (less likely the bottleneck) or active connections per task (more likely). Define a target tracking policy:
- Target: 30 active connections per task
- Min: 2 tasks (HA across AZs)
- Max: 20 tasks
LLM-bound services are usually I/O-bound. CPU autoscaling underestimates load. Connection-based scaling is closer to truth.
Step 10: Observability
CloudWatch is the default; most teams ship logs onward to Datadog, Honeycomb, or self-hosted Loki. Either way:
- Subscribe the log group to your platform's ingestion
- Add the OTEL collector as a sidecar if you want traces
- Define alarms on:
- Service
UnhealthyHostCount> 0 for 5 min - p95 latency > 5 s for 5 min
- Stream-error rate > 5% for 5 min
- Service
The production observability post covers what to log and trace inside the agent.
CI/CD with GitHub Actions
A reasonable pipeline:
name: deploy
on:
push:
branches: [main]
jobs:
build-and-deploy:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- uses: actions/checkout@v4
- uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::<account>:role/github-actions-deploy
aws-region: us-east-1
- uses: aws-actions/amazon-ecr-login@v2
- run: |
docker build -t agentflow-app:${{ github.sha }} .
docker tag agentflow-app:${{ github.sha }} ${{ env.ECR_REGISTRY }}/agentflow-app:${{ github.sha }}
docker push ${{ env.ECR_REGISTRY }}/agentflow-app:${{ github.sha }}
- uses: aws-actions/amazon-ecs-deploy-task-definition@v2
with:
service: agentflow-app
cluster: agentflow-prod
task-definition: ./.aws/task-def.json
Use OIDC for credentials (the id-token: write block). Never long-lived AWS keys in GitHub.
What about Cloud Run / Kubernetes?
The same architecture works on:
- Google Cloud Run. Replace ECS with Cloud Run, RDS with Cloud SQL, ElastiCache with Memorystore. Slightly simpler ops; same code.
- GKE / EKS. Replace ECS task definition with a Deployment + Service. More flexible, more ops.
- Fly.io / Render. Containers are containers. Mostly the same.
Pick whichever your team already operates. The Python agent does not care.
Cost expectations (rough)
A small production agent service on AWS, moderate traffic:
| Resource | Monthly |
|---|---|
| ECS Fargate (2× 1 vCPU / 2 GB) | $40 |
RDS Postgres t4g.medium | $60 |
ElastiCache Redis t4g.small | $30 |
| ALB | $20 |
| CloudWatch logs / data transfer | $20 |
| Infra subtotal | ~$170 |
| LLM provider tokens | varies (usually dominates) |
Most teams' bills are 10× the infra in token costs. Plan accordingly.
Common gotchas
- ALB idle timeout < SSE stream length. Streams get cut mid-response. Bump to 300s.
- Database connection pool too small. A
PgCheckpointerwrites per node; under load, sized it forn_tasks × n_concurrent_streams × 2. - Secrets in plaintext env vars. Use Secrets Manager and
secretsblocks. - No health check. ECS happily routes traffic to dead pods.
/healthis one line of code. - Single-AZ RDS. Cheap until your AZ has an outage. For real production, multi-AZ.
Further reading
- Production runtime concept
- Deployment guide. Full deploy walkthrough
- Generate Docker files
- Production observability + retries
- Auth and authorization
For a working stack to start from, Get started. The same agentflow api server you run locally is what runs in your ECS task.