Build a customer support AI agent in Python
A customer support agent is the highest-leverage AI system for most SaaS teams. Done well, it deflects 30–50% of tickets without lowering CSAT. Done poorly, it generates angrier customers.
Here is the production architecture we recommend with AgentFlow.
Architecture at a glance
[ User message ]
│
▼
[ Intent classifier (LLM) ] ← deterministic routing
│
├─→ [ Order lookup ]
├─→ [ Refund agent ] ── tools: get_order, refund_order, escalate
├─→ [ Shipping agent ] ── tools: get_tracking, reship, escalate
└─→ [ General agent ] ── tools: search_docs, escalate
│
▼
[ If escalate: Human handoff (interrupt + checkpoint) ]
This is a hybrid workflow + agent shape: a deterministic router up front, agents in each branch, human-in-the-loop on escalation. See agents vs workflows for why.
Why this shape
- Cheap routing. A small classifier model picks the lane. The expensive specialist only runs when needed.
- Branch isolation. Refund agent has no shipping tools. Fewer ways for it to confuse itself.
- Audit trail. Each branch is a graph node with explicit tool calls; every decision is loggable.
- Safe escalation. Human handoff is a graph interrupt, durable across restarts.
The router
from agentflow.core.graph import Agent, StateGraph
from agentflow.core.state import AgentState, Message
from agentflow.utils import END
router = Agent(
model="google/gemini-2.5-flash", # small + fast
system_prompt=[{"role": "system", "content": (
"Classify the user's request into exactly one of: REFUND, SHIPPING, GENERAL. "
"Return only the label."
)}],
)
For higher reliability, you can replace the LLM router with a deterministic Python classifier (regex, intent model, or a small fine-tuned head). See multi-agent patterns.
The refund specialist
from agentflow.core.graph import ToolNode
def get_order(order_id: str) -> str:
"""Look up an order by ID. Returns status, items, and total."""
order = orders_db.fetch(order_id)
return order.summary() if order else f"No order found with ID {order_id}."
def issue_refund(order_id: str, reason: str) -> str:
"""Issue a refund for an order. Use only after confirming with the user."""
key = f"refund-{order_id}" # idempotency
return payments.refund(order_id, reason=reason, idempotency_key=key)
def escalate_to_human(reason: str) -> str:
"""Hand off to a human agent. Use when the user is upset or the case is unusual."""
queue.enqueue({"reason": reason, "escalated_at": datetime.now()})
return "Escalated to a human agent. They will respond shortly."
refund_tools = ToolNode([get_order, issue_refund, escalate_to_human])
refund_agent = Agent(
model="anthropic/claude-3-5-sonnet", # bigger model for nuanced cases
system_prompt=[{"role": "system", "content": (
"You handle refund requests. Always look up the order before refunding. "
"Confirm details with the user before issuing a refund. "
"Escalate if the customer is upset, the amount is over $500, or the case is unusual."
)}],
tool_node="REFUND_TOOLS",
)
Notes:
- Idempotency keys on every refund call. See the production post.
- Mixed model sizes. Small for routing, big for specialist work.
- Explicit escalation rules in the system prompt. Models follow rules better than they decide on their own.
The graph
graph = StateGraph(AgentState)
graph.add_node("ROUTE", router)
graph.add_node("REFUND", refund_agent)
graph.add_node("REFUND_TOOLS", refund_tools)
graph.add_node("SHIPPING", shipping_agent)
graph.add_node("SHIPPING_TOOLS", shipping_tools)
graph.add_node("GENERAL", general_agent)
graph.add_node("GENERAL_TOOLS", general_tools)
def pick_lane(state):
label = state.context[-1].text().strip().upper()
return label if label in {"REFUND", "SHIPPING", "GENERAL"} else "GENERAL"
graph.set_entry_point("ROUTE")
graph.add_conditional_edges(
"ROUTE", pick_lane,
{"REFUND": "REFUND", "SHIPPING": "SHIPPING", "GENERAL": "GENERAL"},
)
# Each specialist loops with its tool node
graph.add_conditional_edges("REFUND", route_to_tools, {"REFUND_TOOLS": "REFUND_TOOLS", END: END})
graph.add_edge("REFUND_TOOLS", "REFUND")
# ... same shape for SHIPPING and GENERAL ...
app = graph.compile(checkpointer=PgCheckpointer(...))
Operational notes
- Persist threads from day one. Support conversations span hours;
thread_idmakes them durable. - Stream responses. Users wait. See SSE streaming.
- Human handoff is a checkpoint. When the agent calls
escalate_to_human, the graph pauses; a human picks up the same thread. - Per-user rate limits. A buggy frontend or abusive user can rack up token costs fast.
Metrics that matter
| Metric | Target |
|---|---|
| Deflection rate | 30–50% |
| First-response time (TTFB) | < 1.5 s p95 |
| Escalation rate | 10–25% |
| CSAT (compared to human-only baseline) | within 5% |
| Cost per resolved ticket | < $0.50 |