Introducing web-scale /monitor - always-on search that pings your agent the moment something comes online. Read the docs →

Get started

Ready to build?

Start getting Web Data for free and scale seamlessly as your project expands. No credit card needed.

Are you an AI agent? See setup options

OpenClaw Web Search: How to Make Your Agent Actually Read the Web

Hiba Fathima

Apr 16, 2026 (updated)

TL;DR

web_search sends a query to your configured provider (Brave by default) and returns results: title, URL, and snippet per result
web_fetch takes a specific URL, does an HTTP fetch, and extracts readable content from the HTML as markdown or plain text
Both tools are enabled together under group:web but can be allowlisted individually (web_search, web_fetch, or x_search)
web_fetch does not execute JavaScript, so JS-rendered pages return empty or incomplete content without a fallback
Adding your Firecrawl API key gives web_fetch a real-browser fallback for pages Readability can't extract
Firecrawl is now a first-class web_search provider — set FIRECRAWL_API_KEY and it plugs directly into the web_search tool, no CLI skill required
x_search is a separate built-in tool for searching X (formerly Twitter) posts via xAI, with filters for specific handles, date ranges, and image/video understanding
Installing the Firecrawl CLI skill adds a firecrawl search command that returns search results and full page content in a single step
The Firecrawl /interact endpoint lets your agent act on a page after scraping it: click buttons, fill forms, and navigate to reach content that only appears after an interaction — something no search provider can do on its own

Send your OpenClaw agent a research task and the failure mode is predictable: web_search returns URLs, web_fetch tries to read them, and a lot of the modern web doesn't cooperate with plain HTTP requests.

This guide explains how the pipeline works, what breaks it, and how Firecrawl fixes it. For a direct comparison of web extraction via Claude's built-in fetch versus Firecrawl, see our dedicated comparison.

For a broader look at the full Firecrawl integration with OpenClaw including browser automation, see the OpenClaw + Firecrawl guide.

How OpenClaw's web tools actually work

OpenClaw ships two distinct web tools: web_search and web_fetch. They serve different purposes and are configured separately. Both are enabled together under group:web but can be allowlisted individually.

web_search sends a search query to your configured provider and returns a list of results. With Brave (the default), each result is a structured object: title, URL, and a short snippet. Returns 5 results by default (configurable up to 10), cached for 15 minutes. The tool won't run without an API key. If none is configured, it returns a setup error rather than silently failing.

web_fetch takes a specific URL, makes a plain HTTP GET request, and extracts readable content from the HTML response as markdown or plain text. It does not execute JavaScript.

In practice, these two tools run in sequence. The agent searches for URLs, then fetches each one to read the content. But that handoff is where things break. Brave gives the agent URLs. web_fetch tries to read them. Many modern sites return JavaScript shells to plain HTTP requests: the HTML loads, but the meaningful content renders later in the browser. Others serve 403 errors to anything that doesn't look like an active browser session. web_fetch gets back an empty page or nothing, and the agent proceeds with whatever it has.

The internal extraction order for web_fetch is:

Readability: local main-content extraction from the raw HTML
Firecrawl: if an API key is configured, routes through Firecrawl's API with real browser rendering and bot circumvention
Basic HTML cleanup: strips tags and returns whatever text remains

If Readability fails and Firecrawl isn't configured, the agent falls through to basic cleanup, which often returns navigation links, cookie banners, and other noise instead of article content.

	`web_search`	`web_fetch`
Input	Search query string	Specific URL
Output	Title, URL, and snippet per result	Full page content as markdown or plain text
JavaScript execution	N/A	No — plain HTTP GET only
Default results	5 (configurable up to 10)	Single page
Cache	15 minutes	Configurable via Firecrawl `maxAgeMs` (default 2 days)
Requires API key	Yes — provider key (Brave, Perplexity, or Gemini)	No — but a Firecrawl API key adds real-browser rendering as a fallback
Allowlist token	`web_search`	`web_fetch`

The search provider options

OpenClaw supports 12 providers for web_search. If no provider is explicitly set, OpenClaw auto-detects based on which API keys are present, checking in precedence order:

Brave → MiniMax → Gemini → Grok → Kimi → Perplexity → Firecrawl → Exa → Tavily → DuckDuckGo → Ollama → SearXNG

For a full comparison of OpenClaw search providers — including Firecrawl, Tavily, SearXNG, and pricing for each — see the dedicated guide. For a broader view of web search for AI agents — covering Exa, Tavily, and Perplexity's standalone APIs used outside OpenClaw — see the search engine comparison.

Provider	What it returns	API key
Brave (default)	Structured snippets with country, language, and time filters	`BRAVE_API_KEY`
Gemini	AI-synthesized answer grounded in Google Search	`GEMINI_API_KEY`
Grok	AI-synthesized answers with citations via xAI	`XAI_API_KEY`
Kimi	AI-synthesized answers via Moonshot web search	`KIMI_API_KEY`
Perplexity Sonar	AI-synthesized answer with inline citations and domain filtering	`PERPLEXITY_API_KEY` or `OPENROUTER_API_KEY`
Firecrawl	Structured snippets, best paired with `firecrawl_search` and `firecrawl_scrape` for deep extraction	`FIRECRAWL_API_KEY`
Exa	Neural and keyword search with content extraction	`EXA_API_KEY`
Tavily	Structured snippets with search depth and topic filtering	`TAVILY_API_KEY`
MiniMax Search	Structured snippets with global/China region support	`MINIMAX_CODE_PLAN_KEY`
DuckDuckGo	Structured snippets, no API key required	None
Ollama Web Search	Structured snippets via your local Ollama host	None (requires `ollama signin`)
SearXNG	Self-hosted meta-search aggregating Google, Bing, DuckDuckGo	None (self-hosted)

Firecrawl is now a first-class web_search provider: set FIRECRAWL_API_KEY and it plugs directly into the web_search tool alongside Brave, Gemini, and the others. This wires OpenClaw to Firecrawl's live web search API, so results come back with full page content instead of bare links. It also connects to the pipeline in two additional ways — as a web_fetch fallback and via the CLI skill — which are covered below.

What Firecrawl adds to the pipeline

Firecrawl connects to the OpenClaw web pipeline in two distinct places, and it's worth being precise about which is which because they solve different problems: one patches web_fetch for pages that plain HTTP can't handle, and the other replaces the search step entirely.

Improving web_fetch

Adding your Firecrawl API key to the web_fetch config gives it a second extraction attempt for pages where Readability fails. Instead of falling through to basic HTML cleanup, it routes the request through Firecrawl's API, which uses real browser rendering and bot circumvention automatically.

{
  "tools": {
    "web": {
      "fetch": {
        "firecrawl": {
          "apiKey": "fc-YOUR-API-KEY",
          "onlyMainContent": true,
          "maxAgeMs": 172800000
        }
      }
    }
  }
}

maxAgeMs controls how fresh cached results need to be (in milliseconds). The default is 2 days, fine for content that doesn't change often. For time-sensitive pages like pricing or release notes, lower this to force fresher fetches.

This configuration doesn't change how web_search works. The agent still searches via Brave and still calls web_fetch as a second step. But when web_fetch would otherwise fail on a JS-heavy page, Firecrawl catches it and returns actual content. See the OpenClaw Firecrawl docs for the full config reference.

The CLI skill: search with content in one step

The Firecrawl CLI skill changes the search step itself. Instead of web_search returning a list of URLs that the agent must then fetch individually, your agent runs firecrawl search, which returns search results and the scraped content of each result in a single call.

Install the skill with:

npx -y firecrawl-cli@latest init --all

Or install everything separately:

npm install -g firecrawl-cli
firecrawl init skills
export FIRECRAWL_API_KEY="fc-YOUR-API-KEY"

Verify the setup:

firecrawl --status

Once installed, your agent can run:

# Search and return top results
firecrawl search "OpenClaw release notes February 2026" --limit 10
 
# Search and return results with full scraped content
firecrawl search "OpenClaw release notes February 2026" --scrape --scrape-formats markdown --limit 5

Each result in the --scrape response includes the URL, title, description, and the full markdown content of the page. No separate web_fetch call needed, and no 403 errors, because Firecrawl handles the actual extraction. For a deeper look at what the search endpoint returns, see Mastering the Firecrawl Search Endpoint.

web_fetch fallback vs. CLI skill: which to use

These two integrations are independent and serve different purposes:

	web_fetch fallback	Firecrawl CLI skill
Configured via	JSON config (`tools.web.fetch.firecrawl`)	CLI install (`npx -y firecrawl-cli@latest init --all`)
What it affects	`web_fetch` only, as a fallback when Readability fails	Adds `firecrawl search`, `firecrawl scrape`, crawl, and map as agent commands
Search step	No change: agent still uses `web_search` (Brave etc.)	Replaces the search step: `firecrawl search` returns results and content
Best for	Fixing fetch failures on JS-heavy pages	Research workflows where you want content alongside results from the start

You can run both at the same time. Use the API key config to harden web_fetch, and use the CLI skill when the task calls for search-first workflows with full page content.

Scraping, crawling, and mapping

The CLI skill also gives your agent scraping, crawling, and map capabilities for when search isn't the right tool.

# Scrape a single page
firecrawl https://example.com --only-main-content
 
# Scrape with specific formats
firecrawl https://example.com --format markdown,links --pretty

This is useful when you need to pull structured data from a known URL rather than find it first, or when you want to crawl an entire docs site and process the output.

Interact: when the data is behind an action

Scraping stops at the page. Most of the web data agents actually care about sits behind a search form, a "load more" button, a login, or a filter dropdown. Static scraping — whether via web_fetch or firecrawl scrape — returns what the page renders on first load, and nothing else.

No search provider solves this. Brave, Perplexity, and Gemini all return links or synthesized answers. None of them let your agent take an action inside a live page and extract what comes back. web_fetch doesn't execute JavaScript. firecrawl scrape returns initial render. The problem isn't finding the page. It's what happens after you land on it.

Firecrawl's /interact endpoint addresses this directly. After scraping a page with firecrawl scrape, you stay in that browser session and tell it what to do next, in plain English or Playwright code:

# Scrape a page to open it
firecrawl scrape https://example.com/search
 
# Then interact with it
firecrawl interact "Search for 'quarterly earnings' and click the first result"
firecrawl interact "Extract the table on this page"
firecrawl interact stop

Sessions stay live for up to 10 minutes, and you can chain as many interaction calls as needed within that window.

For OpenClaw agents doing research that hits paywalls, paginated results, or login-gated content, /interact is the part of the pipeline that scraping alone can't cover.

Browser: when scraping isn't enough

OpenClaw's default is to drive a local browser. That works for simple workflows but the costs show up quickly: the agent runs in the same environment as your real browsing state, parallel sessions spike RAM, and runs get flaky under load. Local browsers behave like dev tooling, not infrastructure. Running agents in an isolated agent sandbox removes those risks — no shared state, no local resource pressure.

Firecrawl Browser Sandbox moves that work into a secure, remote, disposable environment. No local Chromium install, no driver setup. agent-browser and Playwright are pre-installed. Your OpenClaw agent can run on a free-tier EC2 instance or a Raspberry Pi while the actual browsing happens elsewhere.

Your agent just issues intent-level commands (open, click, fill, snapshot, scrape) through the firecrawl browser shorthand. Playwright is still available if you need it.

firecrawl browser "open https://news.ycombinator.com"
firecrawl browser "snapshot"
firecrawl browser "scrape"
firecrawl browser close

A few mechanics worth knowing:

Shorthand auto-session: the shorthand form (firecrawl browser "...") auto-launches a sandbox session if one isn't active, so your agent doesn't need to manage session lifecycle up front
Token efficiency: the agent gets back clean artifacts (snapshot, extracted content) instead of raw DOM or driver logs in the context window
Context offloading: fetched pages and interactions are saved to the file system and queried only when needed

You can give your agent a prompt like: "Use Firecrawl Browser Sandbox to open Hacker News and get the top 5 stories and the first 10 comments on each." The agent figures out the rest.

See the Browser Sandbox docs for the full command reference.

web_search tool parameters

Not all parameters work with all providers, but these are available across the most common setups:

Parameter	Description	Provider support
`query`	Search query (required)	All
`count`	Results to return, 1–10, default 5	All
`country`	2-letter ISO code, e.g. `"US"`, `"DE"`	Brave, Perplexity
`language`	ISO 639-1 code, e.g. `"en"`, `"de"`	Brave, Perplexity
`freshness`	Time filter: `day`, `week`, `month`, or `year`	Brave
`date_after`	Results after this date (`YYYY-MM-DD`)	Brave, Exa
`date_before`	Results before this date (`YYYY-MM-DD`)	Brave, Exa
`domain_filter`	Domain allowlist/denylist array	Perplexity only

Gemini, Grok, and Kimi return one AI-synthesized answer with citations and don't apply result-count or date filters. Firecrawl and Tavily only support query and count through web_search — use their dedicated CLI tools for advanced options.

x_search: searching X posts

x_search is a separate built-in search tool for querying X (formerly Twitter) posts. It uses xAI under the hood and returns AI-synthesized answers with citations, not a raw list of tweets. Configure it by adding XAI_API_KEY — the same key used for Grok web search. If you use the XAI_API_KEY for Grok, note that xAI also has a growing Grok plugin ecosystem you can tap into alongside x_search.

# Enable x_search alongside Grok during setup
openclaw configure --section web

Once active, your agent can call x_search naturally:

await x_search({ query: "OpenClaw latest release" });

Key parameters beyond query:

Parameter	Description
`allowed_x_handles`	Restrict results to specific accounts, e.g. `["firecrawl"]`
`excluded_x_handles`	Exclude specific accounts from results
`from_date`	Only posts on or after this date (`YYYY-MM-DD`)
`to_date`	Only posts on or before this date (`YYYY-MM-DD`)
`enable_image_understanding`	Let xAI inspect images attached to matching posts
`enable_video_understanding`	Let xAI inspect videos attached to matching posts

A practical pattern: locate the post first with a broad query, then run a focused second call on the exact URL or status ID to get more complete metadata including engagement stats.

To include x_search in tool allowlists, add it by name or use group:web, which now covers web_search, x_search, and web_fetch together:

{ "tools": { "allow": ["group:web"] } }

Search patterns that work

Three patterns that get consistent results from the OpenClaw web pipeline:

Targeted queries over broad ones. Broad queries return mixed signals; narrow ones give the agent something actionable. "OpenClaw changelog entries February 2026" surfaces the right pages; "OpenClaw news" surfaces a mixed bag. When your agent's task depends on accurate, specific information, guide it toward narrow queries.

Multiple queries instead of one. A single broad query returns mixed results. Running two or three targeted queries in sequence, like "OpenClaw memory tool documentation" and "OpenClaw memory tool community issues" as separate calls, and then combining the results gives the agent better raw material than one catch-all query.

Use --scrape for content-heavy tasks. When the task requires reading actual page content rather than just titles and snippets, firecrawl search ... --scrape returns full markdown in one call and skips the web_fetch round-trip entirely.

Prompt examples that work well:

Search for the three most recent Firecrawl changelog entries and summarize what changed in each.

Find the pricing page for [product] and extract plan names, monthly prices, and any seat or usage limits.

Use Firecrawl to search for "OpenClaw memory tool site:github.com" and read the top result in full.

Find community discussion about OpenClaw search providers from the past week and summarize the most common complaints.

Conclusion

The web_search to web_fetch pipeline is the right mental model for understanding OpenClaw's web access. Each tool has a distinct role, and the failure modes are specific: the search provider delivers links, web_fetch tries to read them, and a lot of the modern web doesn't cooperate with plain HTTP requests.

Firecrawl addresses this at two levels. The API key config patches web_fetch for pages Readability can't handle, adding real browser rendering as a fallback with no change to how search works. The CLI skill goes further: firecrawl search replaces the two-step search-then-fetch pattern entirely, returning content alongside results in one call. And for pages that need an actual browser session, firecrawl browser handles interactive automation in a remote sandbox with no local Chromium required.

For the full configuration reference, the OpenClaw web tools docs cover every parameter for both web_search and web_fetch. And if you're setting up Firecrawl with OpenClaw for the first time, the OpenClaw + Firecrawl guide covers the broader integration including browser automation and deployment. To expand what your agent can do beyond web access, the best OpenClaw skills on ClawHub covers top picks across Gmail, GitHub, memory, and more. For a side-by-side of OpenClaw vs Hermes Agent — covering both web stacks, memory, file handling, and security — Firecrawl's comparison breaks down which agent fits which use case.

Frequently Asked Questions

How can I make OpenClaw actually read the full content of search results instead of just snippets?

web_search only returns a title, URL, and short snippet per result — not the page content. To get full content, the agent calls web_fetch on each URL as a second step. The problem is that web_fetch makes a plain HTTP GET request and can't execute JavaScript, so JS-rendered pages often return empty or incomplete content. The cleanest fix is the Firecrawl CLI skill: run firecrawl search with the --scrape flag and each result comes back with the full scraped page content in a single call, no separate web_fetch round-trip needed. For cases where the agent still uses web_fetch, adding your Firecrawl API key to the config gives it a real-browser fallback for pages Readability can't handle.

Can Firecrawl be used for search in OpenClaw?

Yes, in three ways. First, Firecrawl is now a first-class web_search provider: set FIRECRAWL_API_KEY and it plugs directly into the web_search tool like Brave or Gemini. Second, adding the API key to the web_fetch config gives web_fetch a real-browser fallback for JS-heavy pages. Third, the Firecrawl CLI skill adds a firecrawl search command that returns results with full scraped content in a single call, bypassing web_search entirely. You can use all three at the same time.

What's the difference between the Firecrawl API key config and the Firecrawl CLI skill?

The API key config (tools.web.fetch.firecrawl.apiKey) only affects web_fetch. It gives the tool a fallback extractor for pages that Readability can't handle: JS-heavy sites and dynamically rendered content. The Firecrawl CLI skill is separate. It installs the firecrawl command on your agent and adds search, scrape, crawl, and map capabilities that operate independently of web_search and web_fetch entirely. You can use both at the same time.

Does web_fetch execute JavaScript?

No. web_fetch makes a plain HTTP GET request and extracts readable content from the raw HTML response. It does not execute JavaScript. Pages that render content client-side will return an empty or incomplete result. The Firecrawl fallback (configured via the API key) uses real browser rendering and handles these pages correctly. For full browser automation, the Firecrawl CLI skill also provides browser commands via firecrawl browser.

How do I enable web search in OpenClaw?

Web search is enabled by default under the group:web tool group, which also covers web_fetch and x_search. To activate it, add an API key for at least one supported provider: BRAVE_API_KEY for Brave (the default and highest-priority), GEMINI_API_KEY for Gemini, FIRECRAWL_API_KEY for Firecrawl, or PERPLEXITY_API_KEY for Perplexity. OpenClaw auto-detects which provider to use in this order: Brave, MiniMax, Gemini, Grok, Kimi, Perplexity, Firecrawl, Exa, Tavily, then key-free fallbacks DuckDuckGo, Ollama, and SearXNG. If you want to enable web_search without web_fetch or x_search, allowlist them individually by name instead of using group:web.

Where and how do I configure the web search provider in the openclaw.json file?

Provider selection is driven by environment variables, not openclaw.json directly. Set BRAVE_API_KEY, GEMINI_API_KEY, or PERPLEXITY_API_KEY in your environment and OpenClaw picks the right provider automatically based on which key is present. If you want to pin a specific provider rather than relying on auto-detection, set the provider field explicitly in your config. The one thing that does go in openclaw.json is the Firecrawl web_fetch fallback: add your Firecrawl API key under tools.web.fetch.firecrawl.apiKey. For the full config schema covering both web_search and web_fetch parameters, see the OpenClaw web tools docs at docs.openclaw.ai/tools/web.

Why do I get a Brave Search API 422 error with web_search?

This is a known issue on non-English locale setups, particularly Chinese (issue #42746). OpenClaw auto-detects your locale and passes it as search_lang to Brave, but Brave only accepts specific regional codes — zh-hans for Simplified Chinese and zh-hant for Traditional Chinese. The bare zh code fails Brave's validation and causes a 422 error, which can cascade into task failures or timeouts. The workaround while a fix ships: set tools.web.search.search_lang to zh-hans (or zh-hant) explicitly in your config to bypass auto-detection. Other locales may have similar issues if they produce codes Brave doesn't accept — if you see 422s in a non-Chinese locale, try pinning search_lang manually.

Why is OpenClaw's web search performance inconsistent or 'weak' on some setups?

Usually one of three causes. First, the provider: Brave returns a title, URL, and short snippet per result — not full page content. If your agent is working from snippets alone without calling web_fetch, it has limited material to work with. Second, web_fetch itself: it makes a plain HTTP GET request and does not execute JavaScript. JS-rendered pages return empty or incomplete content. Without a Firecrawl API key configured, web_fetch falls through to basic HTML cleanup, which often returns navigation links and cookie banners instead of article text. Third, result caching: web_search results are cached for 15 minutes, so repeated queries on fast-changing topics can surface stale data. Adding the Firecrawl API key to the web_fetch config and using firecrawl search --scrape instead of web_search addresses the first two causes directly.

What happens if I don't configure any search provider API key?

web_search is enabled by default but requires an API key to function. OpenClaw auto-detects which provider to use based on available keys, checking in the order: Brave → Gemini → Perplexity → Grok. If no keys are found, it returns a short error prompting you to configure one. An alternative is the Firecrawl CLI skill, which provides firecrawl search without needing a web_search provider configured at all.

How do I verify my Firecrawl setup in OpenClaw?

Run firecrawl --status from the terminal to check that the CLI skill is installed and authenticated. For the web_fetch fallback, run openclaw doctor to verify your full tool configuration. You can also test directly by asking your agent to fetch a JS-heavy page and checking whether it returns actual content or empty output.

Can my OpenClaw agent interact with a page after scraping it?

Yes, via the Firecrawl /interact endpoint. After running firecrawl scrape on a URL, the session stays open and the agent can issue follow-up commands: click buttons, fill forms, navigate, and extract content that only appears after an interaction. Actions can be described in plain English or written as Playwright code. No other part of the OpenClaw web pipeline — not web_search, not web_fetch, not any search provider — gives the agent this capability. See the /interact docs at docs.firecrawl.dev/features/interact.

Why use Firecrawl Browser Sandbox instead of OpenClaw's local browser?

OpenClaw's default browser runs locally, which creates two problems. First, the agent operates inside the same environment as your real browsing state, which is a security risk. Second, parallel sessions spike RAM and make runs flaky — local browsers behave like dev tooling, not infrastructure. Firecrawl Browser Sandbox moves each session into a secure, remote, disposable environment. No local Chromium install, no driver setup, no RAM pressure. Your agent can run on a free-tier EC2 instance or a Raspberry Pi while the actual browsing happens elsewhere. agent-browser and Playwright are pre-installed, and the agent gets back clean artifacts instead of raw DOM or driver logs in its context window.

Hiba Fathima @hibafatthima

Growth Marketing Lead at Firecrawl

About the Author

Hiba Fathima is the Growth Marketing Lead at Firecrawl. She is a technical marketer with a background in computer science, and has helped build visibility and adoption at organizations like Supademo, Phyllo, and others.