AgentFlow vs LlamaIndex Agents: runtime-first vs RAG-first
LlamaIndex built its identity around retrieval-augmented generation. Indexes, query engines, and tools that wrap them. Its agent layer (AgentRunner, FunctionAgent, Workflow, ReActAgent) builds on those primitives. AgentFlow comes from the other direction: a runtime built for orchestrating any agent or tool, with a typed graph, persistence, and an API server included.
If your app is mostly a RAG pipeline with light agent behavior, LlamaIndex Agents is a fine choice. If you are building a stateful, multi-agent product where retrieval is one tool among many, this page shows what AgentFlow gives you.
TL;DR: AgentFlow vs LlamaIndex Agents
LlamaIndex's agent surface includes FunctionAgent, ReActAgent, and the newer Workflow API. This compares the production characteristics.
| Dimension | AgentFlow | LlamaIndex Agents |
|---|---|---|
| Primary focus | Multi-agent runtime. Graphs, state, threads, API | RAG-first; agents wrap indexes and query engines |
| Orchestration | Typed StateGraph with conditional edges and sub-graphs | Workflow API (events + steps); plus prebuilt agent runners |
| State | AgentState + Message stream shared by all nodes | Per-agent memory; Workflow uses event payloads |
| Persistence | Built-in InMemoryCheckpointer / PgCheckpointer (Postgres + Redis) | Memory modules; persistence usually wired by the application |
| API serving | ▲Built-in `agentflow api` REST + SSE server | No bundled production server; LlamaCloud is a separate paid product |
| TypeScript client | Typed `@10xscale/agentflow-client` | No first-party TS client |
| RAG integration | Use any vector store via tools; pair with LlamaIndex retrievers easily | Best-in-class. RAG is the framework's core |
| Best for | Stateful multi-agent products with frontend + backend | Index-heavy apps, single-agent RAG, document chat |
Why teams pair (or replace) LlamaIndex Agents with AgentFlow
- Retrieval is one tool, not the whole app. Modern agent products call retrievers, but they also call APIs, run code, route between specialists, and manage long sessions. AgentFlow models the whole flow as a graph; retrieval is one node or one tool.
- You want a runtime, not just primitives. LlamaIndex gives you excellent retrieval, parsing, and indexing primitives. AgentFlow gives you the runtime that orchestrates them and serves the result over HTTP.
- Threads and resumability are first-class. Persistent threads with
thread_idare how AgentFlow handles long sessions. LlamaIndex memory modules cover parts of this; the rest you assemble. - End-to-end stack.
agentflow api+@10xscale/agentflow-clientgive you the API and the typed frontend client. With LlamaIndex, you wire FastAPI / LlamaCloud and a custom fetcher.
Same use case, both frameworks
A small RAG-aware agent that can search a document index and answer with citations.
LlamaIndex (FunctionAgent + tool-wrapped retriever)
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
from llama_index.core.agent.workflow import FunctionAgent
from llama_index.core.tools import QueryEngineTool
from llama_index.llms.openai import OpenAI
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)
search = QueryEngineTool.from_defaults(
query_engine=index.as_query_engine(),
name="search_docs",
description="Search the product documentation.",
)
agent = FunctionAgent(
tools=[search],
llm=OpenAI(model="gpt-4o-mini"),
system_prompt="You answer questions using the search_docs tool. Cite sources.",
)
response = await agent.run("How do I configure SSO?")
print(response)
AgentFlow (graph + retriever-as-tool)
from agentflow.core.graph import Agent, StateGraph, ToolNode
from agentflow.core.state import AgentState, Message
from agentflow.utils import END
# Wrap any retriever — LlamaIndex, LangChain, raw vector client — as a tool
def search_docs(query: str) -> str:
"""Search the product documentation index and return passages with sources."""
# call your retriever here
return "Result text with [doc-1] [doc-2] citations"
tool_node = ToolNode([search_docs])
agent = Agent(
model="google/gemini-2.5-flash",
system_prompt=[{
"role": "system",
"content": "Answer questions using search_docs. Always cite sources by ID.",
}],
tool_node="TOOL",
)
graph = StateGraph(AgentState)
graph.add_node("MAIN", agent)
graph.add_node("TOOL", tool_node)
def route(state):
last = state.context[-1] if state.context else None
if last and getattr(last, "tools_calls", None) and last.role == "assistant":
return "TOOL"
if last and last.role == "tool":
return "MAIN"
return END
graph.add_conditional_edges("MAIN", route, {"TOOL": "TOOL", END: END})
graph.add_edge("TOOL", "MAIN")
graph.set_entry_point("MAIN")
app = graph.compile()
result = app.invoke(
{"messages": [Message.text_message("How do I configure SSO?")]},
config={"thread_id": "rag-1"},
)
print(result["messages"][-1].text())
You keep LlamaIndex for retrieval (where it shines) and let AgentFlow drive the conversation, persist the thread, and expose the API.
Using AgentFlow + LlamaIndex together
Many teams reach for AgentFlow because they have outgrown the agent layer of their RAG framework but want to keep the indexing pipeline. The pattern that works:
from llama_index.core import VectorStoreIndex
from llama_index.core import StorageContext, load_index_from_storage
storage = StorageContext.from_defaults(persist_dir="./index_store")
index = load_index_from_storage(storage)
qe = index.as_query_engine()
def search_docs(query: str) -> str:
"""Search internal docs. Returns passages with source ids."""
response = qe.query(query)
return str(response)
Drop search_docs into a ToolNode and you have AgentFlow orchestrating an LlamaIndex retriever. No rewrite of your indexing pipeline.
Persistence and threads
AgentFlow's checkpointer makes long sessions trivial:
from agentflow.storage.checkpointer import PgCheckpointer
app = graph.compile(checkpointer=PgCheckpointer(
db_url="postgresql+asyncpg://user:password@localhost/agentflow",
redis_url="redis://localhost:6379/0",
))
# Same user, separate device, hours later — full state is restored
app.invoke(
{"messages": [Message.text_message("What did you tell me earlier about SSO?")]},
config={"thread_id": "user-42"},
)
LlamaIndex's ChatMemoryBuffer and friends cover the in-process case; for cross-process, multi-replica deployments, AgentFlow's checkpointer is closer to what production needs.
Serving as an API
pip install 10xscale-agentflow-cli
agentflow init
agentflow api --host 0.0.0.0 --port 8000
You get REST + SSE endpoints for invoke, stream, and thread state. All OSS, deployable on your own infrastructure. LlamaCloud is excellent if you want hosted indexing and parsing, but it does not replace the agent server tier you would otherwise build yourself.
Calling from TypeScript
import {AgentFlowClient, Message} from "@10xscale/agentflow-client";
const client = new AgentFlowClient({baseUrl: "http://127.0.0.1:8000"});
const result = await client.invoke(
[Message.text_message("How do I configure SSO?")],
{config: {thread_id: "ts-rag-1"}},
);
console.log(result.messages.at(-1)?.text());
Migrating from LlamaIndex Agents
Common path:
- Keep your LlamaIndex indexes and retrievers. Wrap them as Python functions and add to a
ToolNode. - Replace
FunctionAgent/ReActAgentwith an AgentFlowAgentplus aToolNodeand conditional edges. - Replace
Workflowevents with explicit graph nodes andadd_conditional_edges. - Move chat memory from
ChatMemoryBufferto a checkpointer +thread_id. - Drop your custom FastAPI server in favor of
agentflow api.
When LlamaIndex Agents is still the right pick
- Document-centric apps. If your product is "chat with my PDFs" or "query a corpus", LlamaIndex's full retrieval stack is best-in-class.
- Single-agent RAG. A
FunctionAgentover a query engine is the shortest path to a working assistant. AgentFlow only pays off once you add multi-agent flows, persistent threads, or a separate frontend. - You want LlamaCloud's managed parsing/indexing. That is a real and useful product; AgentFlow does not replace it.
For everyone else, the typical pattern is: LlamaIndex for retrieval, AgentFlow for the runtime and the API.