What is agentic search (and why cached results aren't enough)

Ninad Pathak

May 12, 2026

What is agentic search (and why cached results aren't enough) image

TL;DR:

Agentic search is a retrieval pattern where an LLM agent queries the live web at reasoning time, not at index time
Cached retrieval gives wrong answers for any content that changed after the last ingestion run
RAG and agentic search serve different data types. The right production architecture uses both
Firecrawl's /search endpoint returns full page content alongside results in a single call, with time-based filtering, category filtering, and location targeting
The Firecrawl CLI installs live web access across Claude Code, Antigravity, and OpenCode in one command: npx -y firecrawl-cli@latest init --all --browser

Think of an AI agent that can only search cached indexes as a researcher who can only read last year's newspapers. The training corpus was seeded on a specific day and anything after that day simply doesn't exist for the agent. For instance, a competitor pricing update, a new security release, or a change in compliance requirements are all unknown.

Agentic search addresses this problem. The agent queries the live web at reasoning time, reads pages as they currently exist, and uses that content to reason.

While you can use pre-indexed vector stores for stable content (internal documentation, product specs, support articles), agentic search is built for everything else: open-web content, real-time signals, and any data source you don't control or update on a predictable schedule.

What is agentic search?

Agentic search is a retrieval method where the agent (instead of going through a pre-planned list of URLs) decides when to search, formulates the query from context, reads actual page content from each result, and evaluates what it found before deciding whether to synthesize or search again.

The search call is an action the agent takes mid-reasoning, the same way it might call an API or execute code.

What makes it agentic rather than automated is that each search can reshape the next. If a query for "langchain memory module" returns documentation for a deprecated API, the agent reformulates the query to: "langchain memory alternatives 2026." It doesn't need a human to notice the mismatch and course-correct.

The search loop continues until the agent has enough data to answer, or determines the answer isn't findable.

Anthropic's engineering team built this into Claude's Research feature where a lead agent analyzes a query, spawns parallel search subagents to cover different angles simultaneously, then synthesizes across their outputs. I'd definitely recommend reading their full writeup on multi-agent retrieval systems.

Why do cached results break AI agent workflows?

An AI agent workflow that relies on cached results will use inaccurate information and take decisions and actions based on that data. Here's an example of how this could easily break your workflows.

A coding agent reading an old framework documentation will confidently recommend a method that was deprecated three versions ago. If you tasked the agent with implementing some tool for its use, it'll use the same old version and you'll open your server up to security vulnerabilities.

If you don't notice this soon enough, the agent would continue to rely on its training data, and keep building additional outdated codebases opening your server up to a wider range of security issues.

What does real-time web search enable for agents?

When done right, real-time web search data will give your agents the following advantages almost immediately:

Fresh data at query time. The page the agent reads is what currently exists. For anything that changes on a short cycle (pricing, security advisories, API documentation), live search is the only retrieval method that guarantees freshness without engineering a separate refresh mechanism on top of your vector store.
Open-ended discovery. A vector store returns what was put into it. Someone had to decide in advance what was worth indexing. Live search removes that constraint so the agent can find pages it has never seen, from sources no one anticipated when the pipeline was built, as long as the content is publicly accessible.
Grounded reasoning inside the loop. Because search runs mid-reasoning rather than as a preprocessing step, what the agent finds changes what it searches for next. A finding in the first result becomes the premise for the second query. The agent doesn't commit to a retrieval strategy before it knows what it needs. Instead, it discovers what's needed as it reasons.

With live web search data available for your agent, you can now use the agent for:

Competitive intelligence. A sales agent answering "how do we compare to X on price?" can immediately query the live pricing page of the competitor, fetch the page and respond to you with the latest information.
Dependency and security monitoring. If your agent is tasked with dependency monitoring, it can search for "CVE langchain 2026" and read current NVD entries to decide the next steps. You can apply this to GitHub Issues, PyPI changelogs, and npm release notes.
Academic research synthesis. A research agent absolutely needs live web data for reading through the latest papers. Querying arXiv's live search means the agent operates on the actual state of the literature, not a snapshot of it from whenever your last embedding run finished.
Location-specific discovery. A business development agent finding SaaS vendors in Southeast Asia can pass location="Singapore" and get geo-ranked results scoped to that market. A globally indexed corpus has no mechanism to encode that geographic ranking signal.

The full list of use cases of live web search covers more patterns across different agent types.

Agentic search vs. RAG: What's the difference?

RAG and agentic search retrieve from fundamentally different AI agent data layers. Both are valuable — the distinction is what kind of data each one is suited for.

You can engineer RAG to cover open-web content, but it requires building and maintaining an ingestion pipeline, supplying the vector store with fresh data, and possibly scraping live web data on a schedule to keep it current. If the ingestion fails, your agent acts on outdated data. For content that changes frequently or that you don't control, that's a fragile architecture.

RAG (Lewis et al., Facebook AI Research, 2020) retrieves chunks from a pre-built index and provides them as context to a generation model. The index is built once (or on a schedule), and retrieval is a vector similarity lookup. RAG is the right tool for internal documents, proprietary data, and stable reference material where you own and control the update cycle.
Agentic RAG adds a reasoning loop around that same retrieval: the agent reformulates queries, evaluates result quality, and chains multiple retrieval passes. Both RAG variants still operate over a pre-indexed document set, bounded by what was in the corpus at ingestion time.
Agentic search has no index. The agent constructs a query, hits a search API, and reads fresh content from the retrieved URLs. Those pages reflect their current state. The search is open-ended: the agent can find documents it has never encountered before.

Dimension	Traditional RAG	Agentic Search
Data source	Pre-indexed vector store	Live web
Freshness	Snapshot from ingestion date	Current at query time
Page content	Chunked embeddings	Full markdown from live page
Discovery	Static, pre-defined document set	Open-ended web query
Coverage	Whatever you've ingested	Anything publicly accessible
Best for	Internal docs, proprietary data	Real-time signals, changing content

For most production agents, you need both:

Vector store via RAG for internal documentation, support articles, proprietary data, stable reference content
Agentic search for live signals, competitor pages, open-web research, any content you don't control

The routing decision sits at the query classification layer, typically a fast intent check before the retrieval call that determines which path a given query takes.

How to give your AI agent real-time web access

Firecrawl covers the full retrieval workflow in one call: find the right sources from the live web, extract the page content, and return clean markdown the agent can use immediately — no separate scraping step, no HTML cleaning pipeline.

The most important step for using live web data is to ensure the data is clean. Firecrawl's search endpoint returns clean markdown from each result page by default, along with search metadata in a single call.

Here's a simple code snippet:

from firecrawl import Firecrawl
 
firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")
 
results = firecrawl.search(
    "langchain v0.3 breaking changes",
    limit=5,
    scrape_options={"formats": ["markdown"]}
)

Output:

[
  {
    "url": "https://www.langchain.com/blog/announcing-langchain-v0-3",
    "title": "Announcing LangChain v0.3",
    "description": "Python 3.8 will no longer be supported as its end-of-life is October 2024. These are the only breaking changes for the Python version.",
    "markdown": "# Announcing LangChain v0.3\n\nThe LangChain Team\n\n## What changed\n\nAll `langchain_community` imports must be updated. The `langchain.chat_models` namespace is removed — use `langchain_openai`, `langchain_anthropic`, or the relevant integration package directly..."
  },
  {
    "url": "https://www.crawleo.dev/blog/langchain-v03-tutorial-and-migration-guide-for-2026",
    "title": "LangChain v0.3 Tutorial & Migration Guide for 2026",
    "description": "Breaking changes that will hit your code. If you are upgrading from pre-0.3 versions, some behavior and APIs have changed significantly.",
    "markdown": "# LangChain v0.3 Migration Guide\n\n## Breaking Changes\n\n1. Pydantic v2 is now required — v1 compatibility shims are removed\n2. Python 3.8 support dropped\n3. `RunnableSequence` replaces `LLMChain` in all cases..."
  },
  {
    "url": "https://github.com/elizaOS/eliza/issues/6145",
    "title": "Deprecation of Langchain v0.3 - migrate to langchain-classic",
    "description": "Since LangChain v0.3 and v1 are incompatible, we cannot easily upgrade the LangChain version in our SDK.",
    "markdown": "## Issue: LangChain v0.3 Deprecation\n\nElizaOS depends on older v0.3 APIs. Upgrading to v1 requires a full rewrite of the memory and tool-call layers..."
  }
]

One call and 5 pages of full markdown without a separate scraping step.

How does time-based filtering work?

What if a simple search isn't enough and you always need to ensure the latest information (maybe from the last week only).

The tbs parameter is perfect for such use cases. You can pin results to a recency window by adding the parameter to your search query as below:

results = firecrawl.search(
    "pydantic v2 deprecations 2026",
    tbs="qdr:w",   # past week
    limit=5
)

Output:

[
  {
    "url": "https://fastapi-patterns.com/advanced-pydantic-validation-serialization/pydantic-v2-migration-guide/",
    "title": "Pydantic V2 Migration Guide: FastAPI Production Patterns",
    "description": "Legacy class Config inner classes are deprecated in favor of model_config = ConfigDict(...), which improves static analysis compatibility and reduces runtime overhead.",
    "markdown": "# Pydantic V2 Migration Guide\n\n## Deprecated Patterns\n\n`class Config` is replaced by `model_config = ConfigDict(...)`. Validators decorated with `@validator` are replaced by `@field_validator`. The `schema()` method is removed — use `model_json_schema()` instead..."
  },
  {
    "url": "https://github.com/pydantic/pydantic-ai-harness",
    "title": "Pydantic AI Harness - GitHub",
    "description": "All breaking changes are documented in release notes with migration guidance. Where practical, we'll keep the previous behavior available under a deprecated flag before removal.",
    "markdown": "# pydantic-ai-harness\n\n## Migration Notes\n\nThe `RunContext` object now requires explicit type parameters. Passing untyped deps will raise a deprecation warning in v0.4 and an error in v0.5..."
  },
  {
    "url": "https://www.linkedin.com/pulse/zero-runtime-dependencies-why-matters-when-adding-sdk-srinivasan",
    "title": "Zero Runtime Dependencies: Why It Matters When Adding an SDK",
    "description": "If your LangChain version requires pydantic v1 and another SDK requires v2, you have a problem. Zero dependencies means the SDK never enters that conflict space.",
    "markdown": "## The Pydantic Version Conflict Problem\n\nPydantic v1 and v2 cannot coexist in the same environment. Any SDK that pins pydantic becomes a dependency forcing function for every project that installs it..."
  }
]

Common tbs values: qdr:h (past hour), qdr:d (past 24h), qdr:w (past week), qdr:m (past month).

What is category filtering?

The categories parameter routes search to specific source types rather than the open web. A research agent querying with categories=["research"] gets arXiv papers, not blog posts about arXiv papers.

results = firecrawl.search(
    "multi-agent coordination survey 2026",
    categories=["research"],  # arXiv, Nature, IEEE, PubMed
    limit=10
)

Output:

[
  {
    "url": "https://arxiv.org/abs/2502.14743",
    "title": "Multi-Agent Coordination across Diverse Applications: A Survey",
    "description": "This survey outlines the current state of coordination research across applications through a unified understanding that answers four fundamental coordination questions.",
    "markdown": "# Multi-Agent Coordination across Diverse Applications: A Survey\n\n**Abstract:** We survey coordination mechanisms across robotics, LLM-based agents, and distributed systems. Key findings: emergent coordination outperforms hand-coded protocols in open environments; communication overhead is the dominant bottleneck in dense agent networks..."
  },
  {
    "url": "https://arxiv.org/list/cs.MA/current",
    "title": "Multiagent Systems May 2026 - arXiv",
    "description": "Authors and titles for May 2026. Total of 100 entries. Title: Coordination Matters: Evaluation of Cooperative Multi-Agent Reinforcement Learning.",
    "markdown": "# cs.MA — Multiagent Systems (May 2026)\n\n1. **Coordination Matters: Evaluation of Cooperative MARL** — benchmarks 14 coordination algorithms across sparse-reward environments, finding that role-based decomposition consistently outperforms centralized critics at scale...\n2. **Emergent Coordination in Multi-Agent Language Models** — measures dynamical emergence in LLM agent networks..."
  },
  {
    "url": "https://arxiv.org/html/2510.05174v4",
    "title": "Emergent Coordination in Multi-Agent Language Models",
    "description": "This information decomposition lets us measure whether dynamical emergence is present in multi-agent LLM systems, localize it, and distinguish genuine coordination from statistical correlation.",
    "markdown": "# Emergent Coordination in Multi-Agent Language Models\n\n## Method\n\nWe apply partial information decomposition to message-passing traces from a 12-agent LLM network. Synergistic information increases by 340% when agents solve tasks requiring shared context versus independent tasks..."
  }
]

For now, these are the available categories you can use to filter results: "research" (arXiv, Nature, IEEE, PubMed), "github" (repos, issues, documentation), "pdf".

Zero data retention

If your enterprise deployment has data handling restrictions, Firecrawl supports end-to-end ZDR. Neither Firecrawl nor the upstream search provider stores the query or result data.

curl -X POST https://api.firecrawl.dev/v2/search \
  -H "Authorization: Bearer fc-YOUR_API_KEY" \
  -d '{
    "query": "HIPAA compliance audit checklist 2026",
    "enterprise": ["zdr"],
    "scrapeOptions": {"formats": ["markdown"]}
  }'

The Firecrawl CLI

The Firecrawl CLI brings Firecrawl into your agent workflows in one command — it installs the CLI and registers it as a skill across every AI coding agent detected on the machine:

npx -y firecrawl-cli@latest init --all --browser

--all covers Claude Code, Antigravity, OpenCode, and other detected agents.
--browser opens browser auth so you don't have to manually locate and paste an API key.

After restarting your agent, it can call firecrawl search, firecrawl scrape, and the rest of the CLI natively.

If you are not comfortable with a terminal or don't have direct access to it, just point your agent to the machine-readable agent onboarding skill. Your agent can then handle auth via browser redirect, poll for the API key, and route to the right CLI command based on the task.

MCP for framework-native integration

If you prefer the web search tool integrated within your MCP-compatible agent framework rather than installing it as a CLI command, we have you covered.

Firecrawl offers an MCP server that exposes search as a native tool so your agent gets firecrawl_search on its tool surface, callable the same way it calls any other tool, with no subprocess, no shell exec, no wrapper code.

How do you choose a web search API for AI agents?

Once you've decided to add live web search to your agent, the API choice comes down to what you want your agent to do with the results.

API	What it returns	Best fit
Firecrawl	Full page markdown + search metadata in one call	Agents that need to read page content and return clean formatted content
OpenAI `web_search`	Snippets with inline citations	GPT-stack agents that need fast grounding
Exa	Results ranked by semantic relevance via end-to-end neural search	Research agents, RAG pipelines, academic discovery
Tavily	Full page content + credibility scores and citation metadata	RAG pipelines and LangChain/LlamaIndex agents that need citable sources with full content

Firecrawl returns full page content alongside results in a single call, with no separate scrape step required. It also supports specialized categories ("research", "github", "pdf"), time-based filtering with tbs, geo-ranked results via location, and Zero Data Retention for regulated deployments — features not available in the other tools here.

OpenAI web_search integrates directly with the Responses API and runs in three modes: single-pass fast lookup, agentic search where the model manages queries mid-reasoning, and deep research for AI agents doing multi-step multi-source tasks. Results include inline citations but no raw page content, and the tool only works within OpenAI's model stack.

Exa uses end-to-end neural networks trained for semantic web search rather than keyword-frequency ranking. For literature review and academic synthesis, that approach surfaces related papers and technical documentation that keyword queries miss.

Tavily returns citation-ready results with credibility scores and citation metadata that LangChain and LlamaIndex consume directly. It also supports full page content extraction alongside those citations, making it a viable option when agents need both readability scores and the underlying text.

Here's a quick decision framework:

Agent needs to read the document (changelog, paper, pricing page, documentation): Use Firecrawl
Agent is GPT-based and needs fast grounding with citations: Use OpenAI web_search
Agent is doing research or literature review: Use Exa
Agent needs citation-ready sources for a RAG pipeline: Use Tavily

For a detailed breakdown of each option with benchmarks and use-case fit, see the full comparison of AI search engines for agents.

Frequently Asked Questions

Is agentic search the same as RAG?

No. RAG retrieves from a pre-built index of documents ingested at a point in time. Agentic search queries the live web and reads pages as they currently exist. A production agent often uses both: RAG for internal documents and stable reference material, agentic search for open-web content and real-time signals.

Does agentic search require a reasoning model?

No. A basic search-and-read loop works with any LLM. You call the search API, pass the results as context, and let the model synthesize. Reasoning models add value when the agent needs to evaluate result quality mid-loop, reformulate queries, or decide whether to keep searching rather than stop and synthesize.

Does the Firecrawl CLI work with Claude Code?

Yes. Running npx -y firecrawl-cli@latest init --all --browser installs Firecrawl skills to Claude Code automatically, along with Antigravity, OpenCode, and other detected AI coding agents. After restarting the agent, it discovers the skills without additional configuration.

When should an agent use time-based filtering vs. open search?

Use tbs: qdr:d or qdr:w for queries where recency is the signal: CVE disclosures, framework releases, changelog entries, pricing changes. Open search without time filtering ranks by relevance across the full web and surfaces older, authoritative content more often. The right filter depends on whether the agent needs the newest answer or the most relevant one.

Does agentic search replace a vector database?

No. A vector database is still the right layer for internal documents, proprietary data, and content you own and update deliberately. Agentic search handles the open web, where pre-indexing isn't feasible and the content you need may not have existed when you last ran the ingestion pipeline.

What's the difference between Firecrawl search and a standard search engine API?

A standard search API returns titles, URLs, and short snippets. Firecrawl's /search optionally scrapes each result and returns full page markdown in the same response. For agents doing discovery only, the difference is marginal. For agents that need to read the content, it's the difference between one API call and two, and between clean markdown and raw HTML that still needs parsing.

Ready to build?

Table of Contents

What is agentic search (and why cached results aren't enough)

What is agentic search?

Why do cached results break AI agent workflows?

What does real-time web search enable for agents?

Agentic search vs. RAG: What's the difference?

How to give your AI agent real-time web access

How does time-based filtering work?

What is category filtering?

Zero data retention

The Firecrawl CLI

MCP for framework-native integration

How do you choose a web search API for AI agents?

Frequently Asked Questions

Is agentic search the same as RAG?

Does agentic search require a reasoning model?

Does the Firecrawl CLI work with Claude Code?

When should an agent use time-based filtering vs. open search?

Does agentic search replace a vector database?

What's the difference between Firecrawl search and a standard search engine API?