Introducing Browser Sandbox - Give your agents a secure, fully managed browser environment Read more →
OpenClaw Web Search: How to Make Your Agent Actually Read the Web
placeholderHiba Fathima
Feb 24, 2026
OpenClaw Web Search: How to Make Your Agent Actually Read the Web image

TL;DR

  • web_search sends a query to your configured provider (Brave by default) and returns results: title, URL, and snippet per result
  • web_fetch takes a specific URL, does an HTTP fetch, and extracts readable content from the HTML as markdown or plain text
  • Both tools are enabled together under group:web but can be allowlisted individually (web_search / web_fetch)
  • web_fetch does not execute JavaScript, so JS-rendered pages return empty or incomplete content without a fallback
  • Adding your Firecrawl API key gives web_fetch a real-browser fallback for pages Readability can't extract
  • Installing the Firecrawl CLI skill adds a firecrawl search command that returns search results and full page content in a single step

Send your OpenClaw agent a research task and the failure mode is predictable: web_search returns URLs, web_fetch tries to read them, and a lot of the modern web doesn't cooperate with plain HTTP requests.

This guide explains how the pipeline works, what breaks it, and how Firecrawl fixes it.

For a broader look at the full Firecrawl integration with OpenClaw including browser automation, see the OpenClaw + Firecrawl guide.

How OpenClaw's web tools actually work

OpenClaw ships two distinct web tools: web_search and web_fetch. They serve different purposes and are configured separately. Both are enabled together under group:web but can be allowlisted individually.

web_search sends a search query to your configured provider and returns a list of results. With Brave (the default), each result is a structured object: title, URL, and a short snippet. Returns 5 results by default (configurable up to 10), cached for 15 minutes. The tool won't run without an API key. If none is configured, it returns a setup error rather than silently failing.

web_fetch takes a specific URL, makes a plain HTTP GET request, and extracts readable content from the HTML response as markdown or plain text. It does not execute JavaScript.

In practice, these two tools run in sequence. The agent searches for URLs, then fetches each one to read the content. But that handoff is where things break. Brave gives the agent URLs. web_fetch tries to read them. Many modern sites return JavaScript shells to plain HTTP requests: the HTML loads, but the meaningful content renders later in the browser. Others serve 403 errors to anything that doesn't look like an active browser session. web_fetch gets back an empty page or nothing, and the agent proceeds with whatever it has.

The internal extraction order for web_fetch is:

  1. Readability: local main-content extraction from the raw HTML
  2. Firecrawl: if an API key is configured, routes through Firecrawl's API with real browser rendering and bot circumvention
  3. Basic HTML cleanup: strips tags and returns whatever text remains

If Readability fails and Firecrawl isn't configured, the agent falls through to basic cleanup, which often returns navigation links, cookie banners, and other noise instead of article content.

The search provider options

OpenClaw supports three built-in providers for web_search. If no provider is explicitly set, OpenClaw auto-detects based on which API keys are present, checking in order: Brave → Gemini → Perplexity → Grok.

ProviderWhat it returnsAPI key
Brave (default)Title, URL, snippet per resultBRAVE_API_KEY
Perplexity SonarAI-synthesized answer with inline citationsPERPLEXITY_API_KEY or OPENROUTER_API_KEY
GeminiAI-synthesized answer grounded in Google SearchGEMINI_API_KEY

Firecrawl is not a web_search provider in the configuration sense above. It doesn't plug into the web_search tool. Instead it connects to the pipeline in two other ways: as a web_fetch fallback (via the API key config), and as the Firecrawl CLI skill, which gives your agent a firecrawl search command that runs search independently of the web_search tool entirely. More on both below.

What Firecrawl adds to the pipeline

Firecrawl connects to the OpenClaw web pipeline in two places. It's worth being precise about which is which, because they solve different problems.

Improving web_fetch

Adding your Firecrawl API key to the web_fetch config gives it a second extraction attempt for pages where Readability fails. Instead of falling through to basic HTML cleanup, it routes the request through Firecrawl's API, which uses real browser rendering and bot circumvention automatically.

{
  "tools": {
    "web": {
      "fetch": {
        "firecrawl": {
          "apiKey": "fc-YOUR-API-KEY",
          "onlyMainContent": true,
          "maxAgeMs": 172800000
        }
      }
    }
  }
}

maxAgeMs controls how fresh cached results need to be (in milliseconds). The default is 2 days, fine for content that doesn't change often. For time-sensitive pages like pricing or release notes, lower this to force fresher fetches.

This configuration doesn't change how web_search works. The agent still searches via Brave and still calls web_fetch as a second step. But when web_fetch would otherwise fail on a JS-heavy or bot-protected page, Firecrawl catches it and returns actual content. See the OpenClaw Firecrawl docs for the full config reference.

The CLI skill: search with content in one step

The Firecrawl CLI skill changes the search step itself. Instead of web_search returning a list of URLs that the agent must then fetch individually, your agent runs firecrawl search, which returns search results and the scraped content of each result in a single call.

Install the skill with:

npx -y firecrawl-cli@latest init --all

Or install everything separately:

npm install -g firecrawl-cli
firecrawl init skills
export FIRECRAWL_API_KEY="fc-YOUR-API-KEY"

Verify the setup:

firecrawl --status

Once installed, your agent can run:

# Search and return top results
firecrawl search "OpenClaw release notes February 2026" --limit 10
 
# Search and return results with full scraped content
firecrawl search "OpenClaw release notes February 2026" --scrape --scrape-formats markdown --limit 5

Each result in the --scrape response includes the URL, title, description, and the full markdown content of the page. No separate web_fetch call needed, and no 403 errors, because Firecrawl handles the actual extraction. For a deeper look at what the search endpoint returns, see Mastering the Firecrawl Search Endpoint.

web_fetch fallback vs. CLI skill: which to use

These two integrations are independent and serve different purposes:

web_fetch fallbackFirecrawl CLI skill
Configured viaJSON config (tools.web.fetch.firecrawl)CLI install (npx -y firecrawl-cli@latest init --all)
What it affectsweb_fetch only, as a fallback when Readability failsAdds firecrawl search, firecrawl scrape, crawl, and map as agent commands
Search stepNo change: agent still uses web_search (Brave etc.)Replaces the search step: firecrawl search returns results and content
Best forFixing fetch failures on JS-heavy or bot-protected pagesResearch workflows where you want content alongside results from the start

You can run both at the same time. Use the API key config to harden web_fetch, and use the CLI skill when the task calls for search-first workflows with full page content.

Scraping, crawling, and mapping

The CLI skill also gives your agent scraping, crawling, and map capabilities for when search isn't the right tool.

# Scrape a single page
firecrawl https://example.com --only-main-content
 
# Scrape with specific formats
firecrawl https://example.com --format markdown,links --pretty

This is useful when you need to pull structured data from a known URL rather than find it first, or when you want to crawl an entire docs site and process the output.

Browser: when scraping isn't enough

OpenClaw's default is to drive a local browser. That works for simple workflows but the costs show up quickly: the agent runs in the same environment as your real browsing state, parallel sessions spike RAM, and runs get flaky under load. Local browsers behave like dev tooling, not infrastructure.

Firecrawl Browser Sandbox moves that work into a secure, remote, disposable environment. No local Chromium install, no driver setup. agent-browser and Playwright are pre-installed. Your OpenClaw agent can run on a free-tier EC2 instance or a Raspberry Pi while the actual browsing happens elsewhere.

Your agent just issues intent-level commands (open, click, fill, snapshot, scrape) through the firecrawl browser shorthand. Playwright is still available if you need it.

firecrawl browser "open https://news.ycombinator.com"
firecrawl browser "snapshot"
firecrawl browser "scrape"
firecrawl browser close

A few mechanics worth knowing:

  • Shorthand auto-session: the shorthand form (firecrawl browser "...") auto-launches a sandbox session if one isn't active, so your agent doesn't need to manage session lifecycle up front
  • Token efficiency: the agent gets back clean artifacts (snapshot, extracted content) instead of raw DOM or driver logs in the context window
  • Context offloading: fetched pages and interactions are saved to the file system and queried only when needed

You can give your agent a prompt like: "Use Firecrawl Browser Sandbox to open Hacker News and get the top 5 stories and the first 10 comments on each." The agent figures out the rest.

See the Browser Sandbox docs for the full command reference.

Search patterns that work

A few patterns that get consistent results from the OpenClaw web pipeline:

Targeted queries over broad ones. Specific search terms outperform general ones. "OpenClaw changelog entries February 2026" gives the agent something actionable; "OpenClaw news" surfaces a mixed bag. When your agent's task depends on finding accurate, specific information, guide it toward narrow queries.

Multiple queries instead of one. A single broad query returns mixed results. Running two or three targeted queries in sequence, like "OpenClaw memory tool documentation" and "OpenClaw memory tool community issues" as separate calls, and then combining the results gives the agent better raw material than one catch-all query.

Use --scrape for content-heavy tasks. When the task requires reading actual page content rather than just titles and snippets, firecrawl search ... --scrape returns full markdown in one call and skips the web_fetch round-trip entirely.

Prompt examples that work well:

Search for the three most recent Firecrawl changelog entries and summarize what changed in each.

Find the pricing page for [product] and extract plan names, monthly prices, and any seat or usage limits.

Use Firecrawl to search for "OpenClaw memory tool site:github.com" and read the top result in full.

Find community discussion about OpenClaw search providers from the past week and summarize the most common complaints.

Conclusion

The web_search to web_fetch pipeline is the right mental model for understanding OpenClaw's web access. Each tool has a distinct role, and the failure modes are specific: the search provider delivers links, web_fetch tries to read them, and a lot of the modern web doesn't cooperate with plain HTTP requests.

Firecrawl addresses this at two levels. The API key config patches web_fetch for pages Readability can't handle, adding real browser rendering as a fallback with no change to how search works. The CLI skill goes further: firecrawl search replaces the two-step search-then-fetch pattern entirely, returning content alongside results in one call. And for pages that need an actual browser session, firecrawl browser handles interactive automation in a remote sandbox with no local Chromium required.

For the full configuration reference, the OpenClaw web tools docs cover every parameter for both web_search and web_fetch. And if you're setting up Firecrawl with OpenClaw for the first time, the OpenClaw + Firecrawl guide covers the broader integration including browser automation and deployment.

Frequently Asked Questions

Can Firecrawl be used for search in OpenClaw?

Yes, but not through the web_search tool. Firecrawl search works via the CLI skill: install it with npx -y firecrawl-cli init --all and your agent can run firecrawl search to get back results with scraped page content in a single call. That means no separate web_fetch round-trip, no 403 errors, and full markdown content for each result. Firecrawl also connects separately as a web_fetch fallback via the API key config.

What's the difference between the Firecrawl API key config and the Firecrawl CLI skill?

The API key config (tools.web.fetch.firecrawl.apiKey) only affects web_fetch. It gives the tool a fallback extractor for pages that Readability can't handle: JS-heavy sites, pages behind bot protection, etc. The Firecrawl CLI skill is separate. It installs the firecrawl command on your agent and adds search, scrape, crawl, and map capabilities that operate independently of web_search and web_fetch entirely. You can use both at the same time.

Does web_fetch execute JavaScript?

No. web_fetch makes a plain HTTP GET request and extracts readable content from the raw HTML response. It does not execute JavaScript. Pages that render content client-side will return an empty or incomplete result. The Firecrawl fallback (configured via the API key) uses real browser rendering and handles these pages correctly. For full browser automation, the Firecrawl CLI skill also provides browser commands via firecrawl browser.

What happens if I don't configure any search provider API key?

web_search is enabled by default but requires an API key to function. OpenClaw auto-detects which provider to use based on available keys, checking in the order: Brave → Gemini → Perplexity → Grok. If no keys are found, it returns a short error prompting you to configure one. An alternative is the Firecrawl CLI skill, which provides firecrawl search without needing a web_search provider configured at all.

How do I verify my Firecrawl setup in OpenClaw?

Run firecrawl --status from the terminal to check that the CLI skill is installed and authenticated. For the web_fetch fallback, run openclaw doctor to verify your full tool configuration. You can also test directly by asking your agent to fetch a JS-heavy page and checking whether it returns actual content or empty output.

Why use Firecrawl Browser Sandbox instead of OpenClaw's local browser?

OpenClaw's default browser runs locally, which creates two problems. First, the agent operates inside the same environment as your real browsing state, which is a security risk. Second, parallel sessions spike RAM and make runs flaky — local browsers behave like dev tooling, not infrastructure. Firecrawl Browser Sandbox moves each session into a secure, remote, disposable environment. No local Chromium install, no driver setup, no RAM pressure. Your agent can run on a free-tier EC2 instance or a Raspberry Pi while the actual browsing happens elsewhere. agent-browser and Playwright are pre-installed, and the agent gets back clean artifacts instead of raw DOM or driver logs in its context window.

FOOTER
The easiest way to extract
data from the web
. . .. ..+ .:. .. .. .:: +.. ..: :. .:..::. .. .. .--:::. .. ... .:. .. .. .:+=-::.:. . ...-.::. .. ::.... .:--+::..: ......:+....:. :.. .. ....... ::-=:::: ..:-:-...: .--..:: ......... .. . . . ..::-:-.. .-+-:::.. ...::::. .: ...::.:.. . -... ....: . . .--=+-::. :-=-:.... . .:..:: .:---:::::-::.... ..::........::=..... ...:-.. .:-=--+=-:. ..--:..=::.... . .:.. ..:---::::---=:::..:... ..........::::.:::::::-::.-.. ...::--==:. ..-::-+==-:... .-::....... ..--:. ..:=+==.---=-+-:::::::-.. . .....::......:: ::::-::.---=+-:..::-+==++X=-:. ..:-::-=-== ---.. .:.--::.. .:-==::=--X==-----====--::+:::+... ..-....-:..::-::=-=-:-::--===++=-==-----== X+=-:.::-==----+==+XX+=-::.:+--==--::. .:-+X=----+X=-=------===--::-:...:. .... ....::::...:-:-==+++=++==+++XX++==++--+-+==++++=-===+=---:-==+X:XXX+=-:-=-==++=-:. .:-=+=- -=X+X+===+---==--==--:..::...+....+ ..:::---.::.---=+==XXXXXXXX+XX++==++===--+===:+X+====+=--::--=+XXXXXXX+==++==+XX+=: ::::--=+++X++X+XXXX+=----==++.+=--::+::::+. ::.=... .:::-==-------=X+++XXXXXXXXXXX++==++.==-==-:-==+X++==+=-=--=++++X++:X:X+++X+-+X X+=---=-==+=+++XXXXX+XX=+=--=X++XXX==---::-+-::::.:..-..
Backed by
Y Combinator
LinkedinGithubYouTube
SOC II · Type 2
AICPA
SOC 2
X (Twitter)
Discord