Build a research AI agent in Python

A research agent answers questions that need external information. Current events, technical references, deep web searches. The hard part is not the search; it is getting the agent to cite sources and avoid making things up.

Here is the production architecture.

Architecture at a glance

[ User question ]
       │
       ▼
[ Plan: identify subqueries (LLM) ]
       │
       ▼
[ Search loop ]
   ├── web_search (Tavily / Perplexity / Bing)
   ├── fetch_url (full content for promising hits)
   ├── search_internal_docs (your vector store)
   └── done?
       │
       ▼
[ Synthesize with citations ]
       │
       ▼
[ Validator: every claim has [source-N] ]

This is an agent that loops on tool calls until satisfied, then runs a deterministic post-step to enforce citation discipline.

Why this shape

Subquery decomposition beats one giant search. "Compare X and Y on dimension Z" → three searches.
Forced citations are the only reliable way to prevent hallucinated facts. The agent must include [source-N] markers; the validator strips out claims that don't have them.
Validator separation. Citation enforcement is deterministic. A regex, not an LLM call.

The tools

import httpx

def web_search(query: str, max_results: int = 5) -> str:
    """Search the public web. Returns titles, URLs, and snippets for top results."""
    response = httpx.post(
        "https://api.tavily.com/search",
        json={"api_key": TAVILY_KEY, "query": query, "max_results": max_results},
        timeout=10.0,
    ).json()
    return "\n\n".join(
        f"[source-{i}] {r['title']}\n{r['url']}\n{r['content']}"
        for i, r in enumerate(response.get("results", []), start=1)
    ) or "No results."

def fetch_url(url: str) -> str:
    """Fetch the full text of a URL. Use after web_search to read promising sources."""
    try:
        text = httpx.get(url, timeout=15.0, follow_redirects=True).text
        return strip_html(text)[:20_000]  # cap to keep context manageable
    except Exception as e:
        return f"Error fetching {url}: {e}"

def search_internal_docs(query: str) -> str:
    """Search internal documentation (your private knowledge base)."""
    hits = vector_client.search(query, top_k=3)
    return "\n\n".join(f"[doc-{h.id}] {h.title}\n{h.text[:500]}" for h in hits)

Notes:

Source IDs are part of the tool output. [source-1], [doc-42]. These are what the agent cites in the final answer.
Caps everywhere. A 1 MB scraped page eats your context.
fetch_url only after web_search. Direct URL fetches without prior search are an attack vector.

The agent

from agentflow.core.graph import Agent, StateGraph, ToolNode

tool_node = ToolNode([web_search, fetch_url, search_internal_docs])

agent = Agent(
    model="anthropic/claude-3-5-sonnet",
    system_prompt=[{"role": "system", "content": (
        "You are a research assistant. "
        "Break the question into subqueries. Use web_search and fetch_url to gather sources. "
        "Use search_internal_docs for company-specific information. "
        "When you have enough information, write a clear answer. "
        "EVERY factual claim must include a citation marker like [source-1] or [doc-2] from the tool outputs. "
        "If you cannot find sources, say so explicitly. Do not invent facts."
    )}],
    tool_node="TOOL",
)

The validator (post-step)

import re

CITATION_RE = re.compile(r"\[(?:source|doc)-\d+\]")

def validate_citations(answer: str, max_uncited_sentences: int = 1) -> tuple[bool, str]:
    sentences = re.split(r"(?<=[.!?])\s+", answer)
    uncited = [s for s in sentences if not CITATION_RE.search(s)]
    if len(uncited) > max_uncited_sentences:
        return False, f"{len(uncited)} sentences without citations. Reject."
    return True, "OK"

If the answer fails validation, you can either retry the agent with feedback or downgrade the response with a "limited sources" disclaimer.

Operational notes

Choose a search provider. Tavily and Perplexity are LLM-friendly (return clean text). Bing API works but you do more parsing.
Cache aggressively. Identical queries from different users should hit the same cache for a few hours.
Rate-limit per user. A research agent can rack up a $5 API bill in 30 seconds if it loops.
Domain filters. For trustworthy research, restrict to a curated allowlist (papers, official docs, news).
Time budget. Cap total research time at 30–60 seconds. Past that, return what you have.

Variants

Competitive research. Research a list of competitors against a fixed dimensions matrix
Due diligence. Research a company across financials, news, and reviews
Academic research. Search arXiv + Semantic Scholar + Google Scholar with citation graphs
News briefing. Daily digest with citations

Same shape; different search tools and validators.

Metrics that matter

Metric	Target
Citation coverage	> 95% of factual sentences
Source diversity per answer	≥ 3 distinct sources
Hallucination rate (manual eval)	< 2%
p95 latency	< 30 s
Cost per query	$0.05–$0.50 depending on depth

Common mistakes

No citation enforcement. The model will summarize and forget where the facts came from.
Trusting the search snippet. Snippets are often misleading; use fetch_url for important claims.
Single-source reliance. Force the agent to gather ≥ 3 sources before synthesizing.
No time / cost cap. Researchy agents love to loop.

Architecture at a glance​

Why this shape​

The tools​

The agent​

The validator (post-step)​

Operational notes​

Variants​

Metrics that matter​

Common mistakes​

Further reading​