Production AI Agents: Observability, Retries, and Graceful Shutdown
The "hello world" agent works on day one. The "we have paying customers and a pager" agent works on day 200. The gap is operational: observability, retries, idempotency, graceful shutdown.
These are the patterns we see in production Python agent codebases, and the failure modes they prevent.