Deploy an AI Agent to Production: Docker, AWS, and the Boring Bits

April 20, 2026 · 7 min read

Building production AI agents in Python

A working agent on your laptop is not a deployed agent. The gap is mostly boring: a Dockerfile, secrets, a database, autoscaling, observability. Done well, it is one afternoon. Done sloppily, it is a series of 3am pages.

Here is the path from compiled graph to production on AWS. Docker image, ECS Fargate, RDS, secrets, scaling.

The shape of a production deployment

For a Python agent service on AWS, the minimum viable architecture is:

ALB → ECS Fargate (your agent) → RDS Postgres (checkpointer)
                              → ElastiCache Redis (hot path)
                              → Secrets Manager (API keys)
                              → CloudWatch (logs)

Each box is one or two days of work. The agent itself is the small part.

Step 1: Dockerfile

A small, reproducible image:

FROM python:3.12-slim

# Reduce image size, avoid pyc files baked in by package install
ENV PYTHONDONTWRITEBYTECODE=1 \
    PYTHONUNBUFFERED=1 \
    PIP_NO_CACHE_DIR=1 \
    PIP_DISABLE_PIP_VERSION_CHECK=1

WORKDIR /app

# Install system deps if you need them (e.g. for asyncpg)
RUN apt-get update && apt-get install -y --no-install-recommends \
    libpq-dev gcc \
    && rm -rf /var/lib/apt/lists/*

# Install Python deps first for layer caching
COPY pyproject.toml uv.lock ./
RUN pip install uv && uv pip install --system --no-cache-dir -e .

# Copy app code last
COPY . .

# AgentFlow CLI
RUN pip install --no-cache-dir 10xscale-agentflow-cli

EXPOSE 8000

# Use exec form so signals reach the process
CMD ["agentflow", "api", "--host", "0.0.0.0", "--port", "8000"]

If you prefer to skip the Dockerfile, AgentFlow's CLI can scaffold one:

agentflow init
agentflow generate-docker

See Generate Docker files for the generated content.

Step 2: Local docker-compose for development

version: "3.9"

services:
  agent:
    build: .
    ports: ["8000:8000"]
    environment:
      DATABASE_URL: postgresql+asyncpg://postgres:postgres@db:5432/agentflow
      REDIS_URL: redis://redis:6379/0
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      GOOGLE_API_KEY: ${GOOGLE_API_KEY}
    depends_on: [db, redis]

  db:
    image: postgres:16-alpine
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres
      POSTGRES_DB: agentflow
    ports: ["5432:5432"]
    volumes:
      - pgdata:/var/lib/postgresql/data

  redis:
    image: redis:7-alpine
    ports: ["6379:6379"]

volumes:
  pgdata:

Run docker compose up, send a request to localhost:8000. If this works, ECS will work. Only the configuration changes.

Step 3: Build and push to ECR

aws ecr create-repository --repository-name agentflow-app
aws ecr get-login-password --region us-east-1 \
  | docker login --username AWS --password-stdin <account>.dkr.ecr.us-east-1.amazonaws.com

docker build -t agentflow-app:latest .
docker tag agentflow-app:latest <account>.dkr.ecr.us-east-1.amazonaws.com/agentflow-app:latest
docker push <account>.dkr.ecr.us-east-1.amazonaws.com/agentflow-app:latest

For CI, use the official aws-actions/amazon-ecr-login GitHub Action.

Step 4: RDS Postgres for the checkpointer

Create the smallest instance that fits your throughput. For a starting point: db.t4g.medium, single-AZ, 50 GB storage.

aws rds create-db-instance \
  --db-instance-identifier agentflow-prod \
  --db-instance-class db.t4g.medium \
  --engine postgres \
  --master-username agentflow \
  --master-user-password 'use-secrets-manager-instead' \
  --allocated-storage 50 \
  --vpc-security-group-ids sg-xxx \
  --db-subnet-group-name your-private-subnet-group

Connection string format: postgresql+asyncpg://agentflow:<password>@<host>:5432/agentflow. Pass via Secrets Manager (next step), not env vars in the task definition.

Step 5: ElastiCache Redis for hot-path state

cache.t4g.small works for most production loads. Single-shard, no clustering needed initially.

Connection string: redis://<host>:6379/0.

Step 6: Secrets Manager

Never put OPENAI_API_KEY in plaintext in your task definition. Create a secret per credential:

aws secretsmanager create-secret \
  --name agentflow/prod/openai \
  --secret-string '{"OPENAI_API_KEY":"sk-..."}'

Reference it in the ECS task definition's secrets block, not environment.

Step 7: ECS Fargate task definition

{
  "family": "agentflow-app",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "1024",
  "memory": "2048",
  "containerDefinitions": [
    {
      "name": "agent",
      "image": "<account>.dkr.ecr.us-east-1.amazonaws.com/agentflow-app:latest",
      "portMappings": [{"containerPort": 8000, "protocol": "tcp"}],
      "environment": [
        {"name": "DATABASE_URL", "value": "postgresql+asyncpg://agentflow:_/agentflow.cluster-xxx.us-east-1.rds.amazonaws.com:5432/agentflow"},
        {"name": "REDIS_URL", "value": "redis://agentflow.xxx.cache.amazonaws.com:6379/0"}
      ],
      "secrets": [
        {"name": "OPENAI_API_KEY", "valueFrom": "arn:aws:secretsmanager:...:secret:agentflow/prod/openai:OPENAI_API_KEY::"}
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:8000/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 30
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/agentflow-app",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "agent"
        }
      }
    }
  ]
}

Notes:

1 vCPU + 2 GB is plenty for an I/O-bound agent service. Add CPU only if you have CPU-bound tools.
Health checks hit /health (AgentFlow exposes this by default).
Logs flow to CloudWatch; route from there to your log platform.

Step 8: ALB and SSL

Put the service behind an Application Load Balancer with HTTPS:

Target group: HTTP 8000, health check on /health
Listener: HTTPS 443 with an ACM certificate
Idle timeout: 300 seconds. Important for SSE streams

If your idle timeout is too low (default is 60s), long agent streams get cut. Tune it.

Step 9: Autoscaling

For an agent service, scale on CPU (less likely the bottleneck) or active connections per task (more likely). Define a target tracking policy:

Target: 30 active connections per task
Min: 2 tasks (HA across AZs)
Max: 20 tasks

LLM-bound services are usually I/O-bound. CPU autoscaling underestimates load. Connection-based scaling is closer to truth.

Step 10: Observability

CloudWatch is the default; most teams ship logs onward to Datadog, Honeycomb, or self-hosted Loki. Either way:

Subscribe the log group to your platform's ingestion
Add the OTEL collector as a sidecar if you want traces
Define alarms on:
- Service UnhealthyHostCount > 0 for 5 min
- p95 latency > 5 s for 5 min
- Stream-error rate > 5% for 5 min

The production observability post covers what to log and trace inside the agent.

CI/CD with GitHub Actions

A reasonable pipeline:

name: deploy
on:
  push:
    branches: [main]

jobs:
  build-and-deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
    steps:
      - uses: actions/checkout@v4
      - uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::<account>:role/github-actions-deploy
          aws-region: us-east-1
      - uses: aws-actions/amazon-ecr-login@v2
      - run: |
          docker build -t agentflow-app:${{ github.sha }} .
          docker tag agentflow-app:${{ github.sha }} ${{ env.ECR_REGISTRY }}/agentflow-app:${{ github.sha }}
          docker push ${{ env.ECR_REGISTRY }}/agentflow-app:${{ github.sha }}
      - uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          service: agentflow-app
          cluster: agentflow-prod
          task-definition: ./.aws/task-def.json

Use OIDC for credentials (the id-token: write block). Never long-lived AWS keys in GitHub.

What about Cloud Run / Kubernetes?

The same architecture works on:

Google Cloud Run. Replace ECS with Cloud Run, RDS with Cloud SQL, ElastiCache with Memorystore. Slightly simpler ops; same code.
GKE / EKS. Replace ECS task definition with a Deployment + Service. More flexible, more ops.
Fly.io / Render. Containers are containers. Mostly the same.

Pick whichever your team already operates. The Python agent does not care.

Cost expectations (rough)

A small production agent service on AWS, moderate traffic:

Resource	Monthly
ECS Fargate (2× 1 vCPU / 2 GB)	$40
RDS Postgres `t4g.medium`	$60
ElastiCache Redis `t4g.small`	$30
ALB	$20
CloudWatch logs / data transfer	$20
Infra subtotal	~$170
LLM provider tokens	varies (usually dominates)

Most teams' bills are 10× the infra in token costs. Plan accordingly.

Common gotchas

ALB idle timeout < SSE stream length. Streams get cut mid-response. Bump to 300s.
Database connection pool too small. A PgCheckpointer writes per node; under load, sized it for n_tasks × n_concurrent_streams × 2.
Secrets in plaintext env vars. Use Secrets Manager and secrets blocks.
No health check. ECS happily routes traffic to dead pods. /health is one line of code.
Single-AZ RDS. Cheap until your AZ has an outage. For real production, multi-AZ.

The shape of a production deployment​

Step 1: Dockerfile​

Step 2: Local docker-compose for development​

Step 3: Build and push to ECR​

Step 4: RDS Postgres for the checkpointer​

Step 5: ElastiCache Redis for hot-path state​

Step 6: Secrets Manager​

Step 7: ECS Fargate task definition​

Step 8: ALB and SSL​

Step 9: Autoscaling​

Step 10: Observability​

CI/CD with GitHub Actions​

What about Cloud Run / Kubernetes?​

Cost expectations (rough)​

Common gotchas​

Further reading​