AI Agents vs Workflows: When to Use Each (and How to Tell)

May 4, 2026 · 6 min read

Building production AI agents in Python

"Should this be an agent or a workflow?" gets asked in every architecture review of an AI system. The answer is rarely binary. Most production systems are hybrids. But you should know which mode you are in for any given subsystem, because the operational cost is wildly different.

Here is the practical decision framework.

The crisp definitions

Workflow: the LLM does not decide control flow. You wrote the steps; the model fills in language-shaped tasks at specific points.

Agent: the LLM decides control flow. You give it tools and a goal; it picks the next action at each step.

A "graph with conditional edges" is a workflow if the conditions are deterministic Python; it is an agent if the conditions involve the model picking the next node.

This is a continuum, not a binary. Most systems are 80% workflow and 20% agent.

Workflow code

def summarize_call(transcript: str) -> dict:
    # Step 1: extract entities (LLM)
    entities = llm.extract(transcript, schema=EntitySchema)

    # Step 2: classify intent (LLM)
    intent = llm.classify(transcript, labels=INTENTS)

    # Step 3: write summary (LLM)
    summary = llm.summarize(transcript, max_words=120)

    # Step 4: store (deterministic)
    db.write(call_id, {"entities": entities, "intent": intent, "summary": summary})

    return {"entities": entities, "intent": intent, "summary": summary}

Three LLM calls, in a fixed order. The model never picks the next step.

Agent code

agent = Agent(
    model="google/gemini-2.5-flash",
    system_prompt=[{"role": "system", "content": (
        "Help the user with their support request. "
        "Use lookup_order to find their order, refund_order to issue refunds, "
        "or escalate_to_human if you cannot resolve it."
    )}],
    tool_node="TOOL",
)

# The model decides which tools to call, in what order, and when to stop
app.invoke(
    {"messages": [Message.text_message("My order #123 hasn't arrived.")]},
    config={"thread_id": "support-1"},
)

The model picks: look up the order, check status, decide whether to refund or escalate, etc. That decision is what makes this an agent.

When workflows win

Workflows are the right answer when:

The steps are predictable. Most data pipelines, ETL with LLM enrichment, batch processing.
You need determinism. Compliance review, regulated outputs, anything where the path matters as much as the result.
Cost matters. Each LLM call is paid; a fixed pipeline pays only for what you need.
Latency matters. No "the model is thinking" loops. Steps run sequentially with predictable timings.
You need to test it. Workflows are easy to unit-test; agent loops are not.

A workflow with three LLM calls is a far simpler operational target than an agent that might make three calls, or seven, or none.

When agents win

Agents are the right answer when:

The path depends on the input. Customer support: did the user ask for a refund or a status check?
You can't enumerate the steps in advance. Research tasks, debugging tasks, anything with branching.
The model needs to decide when to stop. Open-ended exploration with a clear goal.
Tool selection matters. When the right tool depends on what the user said, the model is the natural router.

The killer feature of agents is deciding when to stop. Workflows always run to the end; agents can finish early.

Hybrid: the most common production shape

Most production systems are hybrids:

[ Deterministic workflow ]
    Step 1: Extract entities
    Step 2: Classify intent
    Step 3: ──── Branch on intent ────
                    │
                    ├─→ [ Simple workflow: format response ]
                    │
                    └─→ [ Agent: handle complex cases ]
                            └─ tool calls, loops, etc.
    Step 4: Log result

The deterministic shell handles the boring parts (extraction, logging, routing). The agent handles the parts where the model needs to decide.

Code shape:

def handle_request(req):
    intent = llm.classify(req.text, labels=INTENTS)

    if intent == "simple_lookup":
        return run_lookup_workflow(req)

    if intent == "complex_support":
        return support_agent.invoke(
            {"messages": [Message.text_message(req.text)]},
            config={"thread_id": req.user_id},
        )

This is the right shape for a typical SaaS support system. 80% workflow (cheap, deterministic, fast), 20% agent (handles the long tail).

A decision tree

For any subsystem in your architecture, ask:

Can I enumerate the steps in advance?
- Yes → workflow
- No → agent
Does the path depend on the input?
- Yes → agent (or branching workflow)
- No → workflow
Does the model need to decide when to stop?
- Yes → agent
- No → workflow
Does this need to be deterministic for compliance / testing?
- Yes → workflow (even if the obvious shape is an agent)
- No → either is fine

When in doubt, start with a workflow. It is easier to upgrade a workflow to an agent than to downgrade.

Operational cost differences

Concern	Workflow	Agent
Token cost per request	Predictable	Variable, can spike
Latency p95	Predictable	Long tail
Failure modes	Step N failed	Loop runaway, tool error, recursion limit
Testability	Each step in isolation	End-to-end only
Observability	One trace per step	Full graph trace
Recovery	Retry from failed step	Resume thread from checkpoint

Agents pay an "intelligence tax". The freedom to decide costs you in token spend, latency variance, and operational complexity. Charge for that intelligence only when you need it.

Same primitive, different uses

A graph runtime like AgentFlow or LangGraph supports both. The runtime is the same; the wiring is different:

Workflow: edges are deterministic, conditional functions are pure Python
Agent: at least one node is an Agent whose tool choice drives an edge

You can mix them in one graph. A workflow that calls into a small agent for one branch, then resumes deterministic processing.

graph = StateGraph(AgentState)
graph.add_node("EXTRACT", extract_node)             # workflow
graph.add_node("CLASSIFY", classify_node)           # workflow
graph.add_node("AGENT", complex_support_agent)      # agent
graph.add_node("LOG", log_node)                     # workflow

graph.add_edge("EXTRACT", "CLASSIFY")
graph.add_conditional_edges(
    "CLASSIFY",
    lambda s: "AGENT" if s.intent == "complex" else "LOG",
    {"AGENT": "AGENT", "LOG": "LOG"},
)
graph.add_edge("AGENT", "LOG")
graph.add_edge("LOG", END)

This is the production shape for most non-trivial AI systems.

Common mistakes

Agent-everything. Wrapping a 3-step pipeline in an agent because "agents are cool." You pay 3× the tokens for the same outcome with worse predictability.
Workflow-everything. A long if-else chain trying to anticipate every case the model could handle in two tool calls. Use the agent.
No determinism boundary. When the model gets to call any tool any time, your audit log is unreadable. Define agent boundaries explicitly.
Skipping the workflow shell. A pure agent in production with no deterministic guardrails is a footgun. Wrap it.
Not setting recursion_limit. Agents loop. Always cap them.

The crisp definitions​

Workflow code​

Agent code​

When workflows win​

When agents win​

Hybrid: the most common production shape​

A decision tree​

Operational cost differences​

Same primitive, different uses​

Common mistakes​

Further reading​