Mastering Firecrawl Search Endpoint: Web Search and Data Extraction in One API Call

TL;DR:
- Firecrawl's
/v2/searchendpoint returns search results plus the full page content of each result in a single call. - Use
sourcesto pick result type (web,news,images) andcategoriesto filter to specific domains (github,research,pdf). - Add
scrape_options={"formats": ["markdown"]}to get the page body, not just the snippet. - Available via REST API, Python and Node SDKs, MCP server, and CLI.
- Cost: 2 credits per 10 results; scraping adds standard scrape costs per page.
Web search for AI agents and apps with Firecrawl
Every useful AI agent needs fresh context from the world. Search is how they find it.
Without web search, an agent is limited to what it was trained on โ static, dated, and increasingly wrong for anything that changes. Search is the front door of the AI agent workflow: find the right sources first, then extract and use what's in them.
Most search APIs hand back a list of links and expect you to do the rest. That means a separate scrape call per result, two failure modes, and a lot of glue code. Firecrawl's search endpoint collapses both steps into one โ returning each result's title, URL, and full page content in a single call. This tutorial walks through the endpoint from your first query to a working research agent built on top of it. For a comparison of AI search options, see our AI search engines for agents guide.
Search vs. scrape vs. crawl: understanding Firecrawl's endpoints
If you're an existing Firecrawl user, you might be wondering about the differences between search, scrape, and crawl. Each endpoint solves a different problem in web data collection.
Scrape targets individual pages.
You provide a specific URL and it extracts clean content from that page. It handles JavaScript rendering and converts messy HTML into structured data. Use this when you know exactly which page contains the information you need.
Crawl works through entire websites systematically.
Give it a starting URL and it discovers every connected page on that site, extracting content as it goes. It follows links, handles pagination, and maps out complete website structures. This works well when you need lots of data from a specific domain.
Search finds content across the entire web.
Instead of starting with URLs, you start with queries. Firecrawl searches a curated index of authoritative sources and returns full page content โ not snippets โ from the most relevant results. Choose this approach when you need information but don't know which websites have it.
| Feature | Scrape | Crawl | Search |
|---|---|---|---|
| Input | Specific URL | Starting URL | Search query |
| Scope | Single page | Entire website | Entire web |
| Discovery | None needed | Site exploration | Web-wide search |
| Output | One page content | Multiple related pages | Multiple relevant pages from different sites |
| Best for | Known targets | Complete site data | Research and discovery |
Each method has trade-offs. Scrape is fastest but most limited, while crawl takes longer but covers everything on a site. Search can find the most relevant content but may miss some sources or return results you didn't expect.
In practice, you'll often use multiple endpoints together. You might search to discover relevant sources, then crawl those sites for complex data, or scrape specific high-value pages you found through search. For detailed guides on the other endpoints, see our scrape endpoint tutorial and crawl endpoint guide.
Now let's look at how to get started with search operations.
How to use Firecrawl Search
Before you can search the web with Firecrawl, you need to set up your API credentials and understand how search results work.
Choosing your access method
Firecrawl Search is callable from four places. Pick whichever matches how you build:
REST API. Call POST /v2/search directly:
curl -X POST https://api.firecrawl.dev/v2/search \
-H "Authorization: Bearer fc-YOUR-KEY" \
-H "Content-Type: application/json" \
-d '{"query": "best project management tools", "limit": 5}'Python or Node SDK. Thin wrappers around the API:
from firecrawl import Firecrawl
app = Firecrawl()
results = app.search(query="best project management tools", limit=5)CLI. For one-off searches from the terminal:
npm install -g firecrawl-cli
firecrawl login --api-key fc-YOUR-KEY
firecrawl search "best project management tools" --limit 5See the CLI reference for the full command list.
MCP server. For AI-assisted coding tools that speak the Model Context Protocol (Claude Desktop, Claude Code, Cursor, Windsurf, VS Code). Add the Firecrawl MCP server once and firecrawl_search becomes a tool the model can call directly:
claude mcp add firecrawl --url https://mcp.firecrawl.dev/fc-YOUR-KEY/v2/mcpSee the MCP server docs for client-by-client setup.
Beyond the four direct entry points, Firecrawl Search is also embedded in tools you may already use. OpenRouter lets you set "engine": "firecrawl" inside its openrouter:web_search tool, picking it as one of five search engines. n8n exposes search as a standalone node operation alongside scrape and crawl in its visual workflow editor. OpenClaw, an agent platform built around installable skill files, ships Firecrawl as the default search provider once the CLI skill is installed.
The rest of this tutorial uses the Python SDK because it's the most natural shape for working through examples, but every call shown works from the REST API, the CLI, the MCP server, or any of the integrations above.
Setting up authentication
First, sign up at firecrawl.dev and grab your API key from the dashboard. Install the Python SDK and set up your credentials:
pip install firecrawl-py python-dotenvSave your API key in a .env file (a simple text file that stores environment variables):
echo "FIRECRAWL_API_KEY='fc-YOUR-KEY-HERE'" >> .envThen load it in your Python code:
from firecrawl import Firecrawl
from dotenv import load_dotenv
load_dotenv()
app = Firecrawl()Your first search
Here's the simplest possible search request:
results = app.search(query="best project management tools", limit=5)This searches for "best project management tools" and returns the top 5 results. When search results are not scraped, the cost is 2 credits per 10 search results, so this request uses 2 credits total.
print(f"Found {len(results.web)} web results")
print(f"Search completed successfully")Found 5 web results
Search completed successfullyThe response comes back with structured data. Each result includes metadata that helps you understand what you found.
Understanding the response structure
Search results come back in a structured format. Each result includes:
# First result details
first_result = results.web[0]
print(f"Title: {first_result.title}")
print(f"URL: {first_result.url}")
print(f"Description: {first_result.description}")
print(f"Category: {first_result.category}")Title: 25 Best Project Management Software Picked For 2025
URL: https://thedigitalprojectmanager.com/tools/best-project-management-software/
Description: Explore top-rated project management software handpicked by experts to help you manage teams, timelines, and tasks with ease.
Category: NoneNow you have clean, structured access to search results. But you might wonder what all these results look like together:
The response includes web results by default. Here are all the results from our search:
All 5 results:
1. 25 Best Project Management Software Picked For 2025
https://thedigitalprojectmanager.com/tools/best-project-management-software/
2. What is the best free project management tool, specifically geared ...
https://www.reddit.com/r/projectmanagement/comments/1b0lfvi/what_is_the_best_free_project_management_tool/
3. Honest Review of 6 Personal Project Management Tools ... - ICAgile
https://www.icagile.com/resources/honest-review-of-6-personal-project-management-tools-with-kanban-view
4. Manage your team's work, projects, & tasks online โข Asana
https://asana.com/
5. Project Management Software for Teams - Microsoft
https://www.microsoft.com/en-us/microsoft-365/planner/project-managementBasic error handling
Search requests can fail for various reasons. Here's how to handle common issues:
try:
results = app.search(query="python web scraping", limit=10)
if results.web:
print(f"Found {len(results.web)} results")
else:
print("No results found")
except Exception as e:
print(f"Search failed: {e}")Testing with an empty query shows how error handling works:
results = app.search(query="", limit=5)Empty query properly failed: Query cannot be emptyCommon error codes include rate limiting (429), invalid queries (400), and authentication issues (401). Most errors include descriptive messages to help you troubleshoot.
Search parameters you can control
The search method accepts several parameters to customize results:
query- Your search terms (required)limit- Number of results to return (default: 3, max: 20)sources- Types of results ("web"for general pages,"news"for recent articles,"images"for visual content)categories- Specialized domains to filter by ("github","research","pdf")tbs- Time-based filter using Google codes ("qdr:d","qdr:w","qdr:m"); applies towebresults onlylocation- Geographic region for localized results (e.g.,"Germany")timeout- Request timeout in milliseconds (default:60000)includeDomains- Restrict results to specific domains (e.g.,["nytimes.com", "reuters.com"]); mutually exclusive withexcludeDomainsexcludeDomains- Exclude specific domains from results (e.g.,["reddit.com"]); mutually exclusive withincludeDomainsscrape_options- Scrape each result and return full content (same options as the scrape endpoint)
A search with different sources might look like this:
results = app.search(
query="machine learning frameworks 2025",
limit=3,
sources=["web", "news"],
timeout=30000 # 30 seconds
)Advanced search found 3 results
1. 8 Best Machine Learning Software To Use in 2025 | Anaconda
2. AI Frameworks: Top Types To Adopt in 2025 - Splunk
3. Uses for ML frameworks like Pytorch/Tensorflow/etc in 2025 - RedditCommon use cases for basic search
Use the search endpoint when you need to:
- Find recent information on a topic without knowing specific websites
- Research competitors or market trends across multiple sources
- Discover relevant content for further analysis or scraping
- Build datasets by finding pages that match certain criteria
This gives you the foundation for web search operations. In the next section, we'll look at the real power of Firecrawl's search: extracting full content from these results instead of just getting titles and descriptions.
Search and content extraction
The real power of Firecrawl's search endpoint is combining search with content extraction in one API call. When you add scrape_options to a search, Firecrawl automatically runs its scrape endpoint on each search result.
How search + scraping works together
When you add scrape options, Firecrawl finds the most relevant pages for your query, then runs the scrape endpoint on each result:
# This search finds pages, then scrapes each one automatically
content_results = app.search(
query="web scraping best practices",
limit=2,
scrape_options={
"formats": ["markdown", "links"]
}
)
print(f"Content search found {len(content_results.web)} results")
for i, result in enumerate(content_results.web, 1):
print(f"\nResult {i}:")
print(f"Title: {result.metadata.title}")
print(f"URL: {result.metadata.url}")
print(f"Content length: {len(result.markdown)} characters")
print(f"Found {len(result.links)} links in content")Content search found 2 results
Result 1:
Title: Web Scraping Best Practices and Tools 2025 - ZenRows
URL: https://www.zenrows.com/blog/web-scraping-best-practices
Content length: 25234 characters
Found 51 links in content
Result 2:
Title: 7 Web Scraping Best Practices You Must Be Aware of
URL: https://research.aimultiple.com/web-scraping-best-practices/
Content length: 22384 characters
Found 111 links in contentInstead of getting 100-character descriptions, you get complete articles with 20,000+ characters of content plus all extracted links. This is where Firecrawl shows its real value.
Format options for different applications
The scrape_options parameter supports the same formats as the scrape endpoint:
results = app.search(
query="python tutorials",
limit=1,
scrape_options={
"formats": ["markdown", "html", "links"]
}
)
# Each result contains full content in multiple formats
result = results.web[0]
print(f"Markdown: {len(result.markdown)} characters")
print(f"HTML: {len(result.html)} characters")
print(f"Links: {len(result.links)} found")Markdown: 33640 characters
HTML: 221019 characters
Links: 302 foundNotice how the HTML version is much larger (221k characters) than the clean markdown (33k characters), and how many links (302) are extracted from the page structure.
The extracted content is immediately ready for processing. No additional parsing or cleaning needed:
Working with the extracted content
The content is immediately ready for processing:
results = app.search(
query="machine learning research 2025",
limit=2,
scrape_options={"formats": ["markdown"]}
)
for result in results.web:
# Content is already extracted and clean
content = result.markdown
word_count = len(content.split())
print(f"Found article: {result.metadata.title}")
print(f"Content: {word_count} words")Found article: 2025 Conference
Content: 466 words
Found article: Apple Machine Learning Research at ICML 2025 - Apple Machine Learning Research
Content: 2070 wordsThe search endpoint with scrape options combines the discovery power of web search with the content extraction capabilities of the scrape endpoint. You find relevant pages and get their full content without managing multiple API calls or handling different website structures yourself.
For more control over content extraction, including options like include_tags, exclude_tags, and wait_for, check out our complete scrape endpoint guide which covers all available scraping parameters.
Targeted content discovery
Most search APIs compete on coverage: more domains crawled, more pages indexed. Firecrawl competes on what's in the index and what's not. Authoritative sources are curated in; low-quality filler is left out. On top of that, you can target specific content types directly โ so you get code repos, papers, or news instead of a mix of all three when you only need one.
Search categories for specialized content
When you're building developer tools, generic web results miss the mark. You need code repositories, not blog posts. The categories parameter solves this by targeting specialized platforms directly. Three values are supported: github, research, and pdf.
Here's a small helper we'll reuse for each category:
def display_results(results, result_type="web", title="Results"):
result_list = getattr(results, result_type, [])
print(f"{title}:")
for result in result_list:
print(f"- {result.title}")
print(f" Category: {getattr(result, 'category', 'N/A')}")
print(f" URL: {result.url}")github: code repositories
The github category pulls results from GitHub repos, code files, and issues. Good for finding implementations when a topic has a lot of blog noise around it.
github_results = app.search(
query="python web scraping",
limit=3,
categories=["github"],
)
display_results(github_results, title="GitHub category results")GitHub category results:
- Scrapy, a fast high-level web crawling & scraping ... - GitHub
Category: github
URL: https://github.com/scrapy/scrapy
- Web Scraping With Python - GitHub
Category: github
URL: https://github.com/KOrfanakis/Web_Scraping_With_Python
- awesome-web-scraping/python.md at master - GitHub
Category: github
URL: https://github.com/lorien/awesome-web-scraping/blob/master/python.mdresearch: academic papers
The research category scopes results to academic sources like arXiv, Nature, IEEE, and PubMed. Good for grounding a claim in a peer-reviewed paper instead of a Medium post.
research_results = app.search(
query="machine learning transformers",
limit=3,
categories=["research"],
)
display_results(research_results, title="Research category results")Research category results:
- [2304.10557] An Introduction to Transformers - arXiv
Category: research
URL: https://arxiv.org/abs/2304.10557
- Introduction to Sequence Modeling with Transformers - arXiv
Category: research
URL: https://arxiv.org/html/2502.19597v1
- Multimodal Learning With Transformers: A Survey - PubMed
Category: research
URL: https://pubmed.ncbi.nlm.nih.gov/37167049/pdf: PDF documents only
The pdf category filters to PDFs, which is handy when you want longer-form content (textbooks, technical reports, whitepapers) instead of HTML articles.
pdf_results = app.search(
query="deep learning",
limit=3,
categories=["pdf"],
)
display_results(pdf_results, title="PDF category results")PDF category results:
- [PDF] Deep Learning
Category: pdf
URL: https://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf
- [PDF] The Principles of Deep Learning Theory - Purdue Engineering
Category: pdf
URL: https://engineering.purdue.edu/DeepLearn/Resources/DeepLearningTheory.pdf
- [PDF] Deep Learning - Microsoft
Category: pdf
URL: https://www.microsoft.com/en-us/research/wp-content/uploads/2016/02/DeepLearning-NowPublishing-Vol7-SIG-039.pdfCategories can be combined. Passing categories=["github", "research"] returns both code and papers in the same call. You'll see this combined pattern later in the Advanced Filtering section.
Source type filtering for different content formats
Beyond categories, you often need different content formats for the same topic. For example, news articles give you recent developments, while web pages offer educational guides. The sources parameter controls this.
# Helper function for source comparison
def compare_sources(query, limit=2):
web_results = app.search(query=query, limit=limit, sources=["web"])
news_results = app.search(query=query, limit=limit, sources=["news"])
print("Web source results:")
for result in web_results.web:
domain = result.url.split('/')[2] if '/' in result.url else result.url
print(f"- {result.title}")
print(f" Domain: {domain}")
print("\nNews source results:")
if hasattr(news_results, 'news') and news_results.news:
for result in news_results.news:
print(f"- {result.title}")
print(f" Date: {getattr(result, 'date', 'N/A')}")
compare_sources("AI developments 2025")Web source results:
- The 2025 AI Index Report | Stanford HAI
Domain: hai.stanford.edu
- The Latest AI News and AI Breakthroughs that Matter Most: 2025
Domain: www.crescendo.ai
News source results:
- Microsoft's Unprecedented 2025: Cloud and AI Fuel Record Earnings, Igniting Shareholder Optimism
Date: 12 hours ago
- 12 Graphs That Explain the State of AI in 2025
Date: Apr 7, 2025Web sources return reports from authoritative domains and the news sources provide recent articles with timestamps. This distinction matters when you need current market developments versus foundational research.
You can also combine sources in a single request. The limit parameter applies per source, so passing limit=5 with sources=["web", "news"] returns up to 10 results total, five from each:
combined = app.search(
query="AI developments 2025",
limit=5,
sources=["web", "news"]
)
print(f"Web: {len(combined.web)} results")
print(f"News: {len(combined.news) if hasattr(combined, 'news') else 0} results")Web: 5 results
News: 5 resultsVisual content search
Text search is straightforward. Image search gets complex fast. You need specific resolutions, aspect ratios, or image types. Firecrawl's image search includes filtering options that traditional search APIs don't have.
Basic image search
Start with sources=["images"] for visual content:
# Basic image search
basic_images = app.search(
query="mountain landscape",
limit=3,
sources=["images"]
)
print("Basic image search results:")
if hasattr(basic_images, 'images') and basic_images.images:
print(f"Found {len(basic_images.images)} images")
for i, img in enumerate(basic_images.images, 1):
print(f"{i}. {img.title}")
print(f" Source page: {img.url}")
print(f" Position: {img.position}")Basic image search results:
Found 3 images
1. Free Sunset Mountain Landscape Image | Download at StockCake
Source page: https://stockcake.com/i/sunset-mountain-landscape_1219985_159231
Position: 1
2. Watercolor Mountain Landscape Tutorial: Step-by-Step Guide
Source page: https://www.esperoart.com/mountain-landscape-watercolor-tutorial/
Position: 2
3. Mountain Landscape with Forests and a Stream ยท Free Stock Photo
Source page: https://www.pexels.com/photo/mountain-landscape-with-forests-and-a-stream-18682254/
Position: 3This gives you relevant visual content. But many applications need specific image dimensions.
HD and 4K image search with size operators
Computer vision projects require consistent input dimensions. Machine learning models trained on specific image sizes need datasets with matching resolutions. Firecrawl's image search supports Google's established image size operators, letting you filter results by exact dimensions or minimum sizes.
These operators work within your search query string to target specific resolutions. Instead of getting random image sizes that require post-processing, you get images that match your requirements from the start.
Here's an example:
def search_images_by_resolution():
# Search for Full HD images
hd_images = app.search(
query="sunset landscape imagesize:1920x1080",
limit=3,
sources=["images"]
)
print("HD image search results:")
if hasattr(hd_images, 'images') and hd_images.images:
for i, img in enumerate(hd_images.images, 1):
print(f"{i}. {img.title}")
print(f" Source: {img.url}")
# Search for 4K images
uhd_results = app.search(
query="nature wallpaper imagesize:3840x2160",
limit=2,
sources=["images"]
)
print("\n4K image search results:")
if hasattr(uhd_results, 'images') and uhd_results.images:
for img in uhd_results.images:
print(f"- {img.title}")
print(f" Source: {img.url}")
search_images_by_resolution()HD image search results:
1. Stunning sunset landscape: Mediterranean rocks with olive trees in the sea. Photorealistic motion background. 3D rendering.
Source: https://www.storyblocks.com/video/stock/photorealistic-motion-background-mediteranean-rocks-in-the-sea-with-olive-trees-growing-on-them-beautiful-sunset-landscape-scene-3d-rendering-bymitreh8kaz2db71
2. Sunset in a hilly rural landscape on the edge of a forest, Timelapse video
Source: https://www.storyblocks.com/video/stock/sunset-in-a-hilly-rural-landscape-on-the-edge-of-a-forest-timelapse-video-351768226
4K image search results:
- 50 Nature Wallpapers (All 4K, No watermarks) - Album on Imgur
Source: https://in.pinterest.com/pin/1002543567023869780/
- My top 10 of all time - Nature and Art : r/wallpapers
Source: https://www.reddit.com/r/wallpapers/comments/1g1cymc/my_top_10_of_all_time_nature_and_art/Common resolution operators for different use cases:
imagesize:1920x1080- Full HD for standard displaysimagesize:2560x1440- QHD for high-resolution screensimagesize:3840x2160- 4K UHD for professional applicationslarger:1920x1080- HD and above for flexible sizing
Advanced filtering and strategy selection
Basic search gets you started, but real world applications need precise targeting. You might need recent news instead of evergreen content, or results from multiple specialized sources combined.
Time-based filtering for current versus evergreen content
Web search results span decades of content, but applications often need recent information versus historical data. The tbs parameter filters results by publication date using Google's time-based search codes. These abbreviated codes like qdr:d (past day), qdr:w (past week), and qdr:m (past month) let you separate trending topics from foundational concepts. Market research needs last month's data, while educational content remains valuable regardless of age.
Here's how the tbs parameter filters results by recency:
def compare_content_freshness(base_query):
recent_results = app.search(
query=f"{base_query} trends",
limit=3,
tbs="qdr:m" # Past month
)
all_time_results = app.search(
query=f"{base_query} fundamentals",
limit=3
)
print("Recent results (past month):")
for result in recent_results.web:
print(f"- {result.title}")
print("\nAll-time results (fundamentals):")
for result in all_time_results.web:
print(f"- {result.title}")
compare_content_freshness("machine learning")Recent results (past month):
- Top 13 Machine Learning Trends CTOs Need to Know in 2025
- The Evolution of AI and ML: Trends, Impact, and Future Insights
- The Ultimate List of Machine Learning Statistics for 2025 - Itransition
All-time results (fundamentals):
- Introduction to Machine Learning Concepts - Training - Microsoft Learn
- Machine Learning Crash Course - Google for Developers
- Machine Learning Tutorial - GeeksforGeeksTime filtering with qdr:m returns current trend analysis. Searches without time filters return educational resources and established documentation.
Combining multiple categories for complete coverage
Complex applications often need content from multiple specialized sources. Research tools might combine academic papers with code implementations. You can search across different specialized sources at once:
multi_category_results = app.search(
query="neural networks",
limit=4,
categories=["github", "research"]
)
print("Combined GitHub + Research results:")
for result in multi_category_results.web:
print(f"- {result.title}")
print(f" Category: {getattr(result, 'category', 'N/A')}")
print(f" URL: {result.url}")Combined GitHub + Research results:
- Neural Networks | Journal | ScienceDirect.com by Elsevier
Category: research
URL: https://www.sciencedirect.com/journal/neural-networks
- Neural Network - an overview | ScienceDirect Topics
Category: research
URL: https://www.sciencedirect.com/topics/social-sciences/neural-network
- Neural Network - an overview
Category: research
URL: https://www.sciencedirect.com/topics/computer-science/neural-network
- Supervised learning in DNA neural networks
Category: research
URL: https://www.nature.com/articles/s41586-025-09479-wChoosing the right search strategy
Match your search parameters to your content requirements:
- Categories: Use
githubwhen you need code implementations,researchfor academic backing - Sources: Use
newsfor market trends,webfor in-depth guides,imagesfor visual content - Time filters: Apply
qdr:d(day),qdr:w(week),qdr:m(month) when current information matters - Image operators: Use size filters when your models require specific input dimensions
- Multiple categories: Combine approaches when you need diverse source types
Building an AI research agent with LangGraph and Firecrawl
Most search applications follow the same pattern: user asks question, system returns links, user manually evaluates results. This works for simple queries, but breaks down when one query has to pull from different content types at once. A research question may need code from GitHub, papers from arXiv, and recent regulatory news in the same answer.
Why an agent beats a flat script
A flat script always runs the same search. That's fine when you know what you're looking for, but most real questions span content types. "Machine learning for healthcare" needs academic papers from arXiv or PubMed, code from GitHub, and recent news on regulations. A single search call across one source won't cover any of those well.
An agent fixes this by picking the search strategy at call time. It reads the query, decides which content type fits, and runs the matching search. We'll use LangGraph to wire that decision in, with one tool per content type and the LLM choosing between them.
What we're building: a multi-tool research agent
Our research agent uses four specialized search tools, each targeting different aspects of Firecrawl's search endpoint. LangGraph provides the reasoning layer that decides which tools to use based on user intent. The agent can:
- Automatically pick between web, academic, code, and news searches
- Combine multiple search types in one answer
- Handle API errors without breaking the user experience
- Remember context across different questions in a conversation
This demonstrates how web search APIs for AI applications can power sophisticated, context-aware systems that adapt to user needs.
When NOT to use this approach
Agents add complexity and cost. Use simple search functions instead when:
- You always need the same search type (like only GitHub repositories)
- Response time is more important than search quality
- You're building a prototype or MVP
- Your use case doesn't require runtime tool selection
For deep research for AI agents, competitive intelligence, or AI training data collection, the agent approach provides significant value.
Setting up the development environment
Before building the agent, you need API keys and dependencies. This agent requires both OpenAI (for reasoning) and Firecrawl (for searching).
Get your API keys:
Add both keys to your environment file:
echo "FIRECRAWL_API_KEY='fc-YOUR-FIRECRAWL-KEY'" >> .env
echo "OPENAI_API_KEY='sk-YOUR-OPENAI-KEY'" >> .envYou can install dependencies using uv (recommended for faster, more reliable package management):
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc # or ~/.zshrc
# uv expects to work in a Python project directory with a pyproject.toml file
uv init
# Install project dependencies
uv add firecrawl-py python-dotenv langchain-openai langgraphIf you prefer pip:
pip install firecrawl-py python-dotenv langchain-openai langgraphCreating the specialized search tools
Start by building four tools that target different aspects of Firecrawl's search endpoint. Each tool handles a specific content type but follows the same pattern:
#!/usr/bin/env python3
"""Research Assistant Agent using LangGraph and Firecrawl Search API"""
import os
from dotenv import load_dotenv
from firecrawl import Firecrawl
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
# Load environment variables
load_dotenv()
firecrawl = Firecrawl()
def general_web_search(query: str, limit: int = 3) -> str:
"""Search the web for general information and extract full content."""
try:
results = firecrawl.search(
query=query,
limit=limit,
sources=["web"],
scrape_options={"formats": ["markdown"]}
)
if not results.web:
return f"No web results found for: {query}"
formatted_results = f"Web Search Results for '{query}':\n\n"
for i, result in enumerate(results.web, 1):
title = result.title
url = result.url
content_preview = result.markdown[:300] + "..." if len(result.markdown) > 300 else result.markdown
formatted_results += f"{i}. {title}\n"
formatted_results += f" URL: {url}\n"
formatted_results += f" Content: {content_preview}\n\n"
return formatted_results
except Exception as e:
return f"Error in web search: {str(e)}"With Firecrawl as the search API, you get a curated index combined with full content extraction in one call. The scrape_options parameter returns complete articles ready to drop into a model's context window โ no extra parsing, no incomplete snippets. At scale, this matters: your agent reasons on clean signal rather than noise, which cuts inference costs significantly across both search and scrape.
Add the specialized category searches:
def academic_research_search(topic: str, limit: int = 3) -> str:
"""Search for academic research papers and scholarly content."""
try:
results = firecrawl.search(
query=topic,
limit=limit,
categories=["research"],
scrape_options={"formats": ["markdown"]}
)
if not results.web:
return f"No research papers found for: {topic}"
formatted_results = f"Academic Research Results for '{topic}':\n\n"
for i, result in enumerate(results.web, 1):
title = result.title
url = result.url
category = getattr(result, 'category', 'research')
formatted_results += f"{i}. {title}\n"
formatted_results += f" Category: {category}\n"
formatted_results += f" URL: {url}\n\n"
return formatted_results
except Exception as e:
return f"Error in research search: {str(e)}"
def code_search(technology: str, limit: int = 3) -> str:
"""Search GitHub repositories for code examples and projects."""
try:
results = firecrawl.search(
query=technology,
limit=limit,
categories=["github"]
)
if not results.web:
return f"No GitHub repositories found for: {technology}"
formatted_results = f"GitHub Code Search Results for '{technology}':\n\n"
for i, result in enumerate(results.web, 1):
formatted_results += f"{i}. {result.title}\n"
formatted_results += f" URL: {result.url}\n"
formatted_results += f" Description: {result.description or 'No description available'}\n\n"
return formatted_results
except Exception as e:
return f"Error in code search: {str(e)}"
def news_search(topic: str, limit: int = 3) -> str:
"""Search for recent news articles on a topic."""
try:
results = firecrawl.search(
query=topic,
limit=limit,
sources=["news"],
)
news_results = []
if hasattr(results, 'news') and results.news:
news_results = results.news
elif hasattr(results, 'web') and results.web:
news_results = results.web
if not news_results:
return f"No recent news found for: {topic}"
formatted_results = f"News Search Results for '{topic}':\n\n"
for i, result in enumerate(news_results, 1):
formatted_results += f"{i}. {result.title}\n"
formatted_results += f" URL: {result.url}\n"
formatted_results += f" Date: {getattr(result, 'date', 'Date not available')}\n\n"
return formatted_results
except Exception as e:
return f"Error in news search: {str(e)}"These four tools create a complete search surface. Each targets different content types, but the real value comes from runtime selection. The agent analyzes user intent and picks the right tool automatically.
Building the LangGraph reasoning layer
Now connect these tools with LangGraph's reasoning framework. LangGraph handles the routing logic that picks tools at runtime. For more details on LangGraph patterns and agent architectures, see our LangGraph tutorial.
def create_research_agent():
"""Create the research assistant agent with search tools."""
model = ChatOpenAI(model="gpt-5", temperature=0)
tools = [general_web_search, academic_research_search, code_search, news_search]
agent = create_react_agent(
model=model,
tools=tools,
checkpointer=MemorySaver()
)
return agentThe agent receives a system prompt that explains when to use each tool. LangGraph's ReAct framework handles the reasoning process.
def main():
"""Main function to run the research agent."""
print("๐ Research Assistant Agent")
print("=" * 50)
print("I can help you search for:")
print("โข General web information with full content")
print("โข Academic research papers")
print("โข GitHub code repositories")
print("โข Recent news articles")
print("\nType 'quit' to exit")
print("=" * 50)
agent = create_research_agent()
while True:
user_input = input("\n๐ฌ What would you like to research? ")
if user_input.lower() in ['quit', 'exit', 'q']:
print("๐ Goodbye!")
break
if not user_input.strip():
continue
try:
print("\n๐ค Searching...")
system_message = """You are a Research Assistant Agent that helps users find information on any topic.
You have access to four specialized search tools:
1. general_web_search - For web search with full content extraction
2. academic_research_search - For finding academic papers and scholarly content
3. code_search - For finding GitHub repositories and code examples
4. news_search - For finding recent news articles
Choose the appropriate tool(s) based on the user's request. You can use multiple tools for extensive research."""
response = agent.invoke(
{"messages": [("system", system_message), ("user", user_input)]},
config={"configurable": {"thread_id": "research_session"}}
)
final_message = response["messages"][-1]
print(f"\n๐ Research Results:\n{final_message.content}")
except Exception as e:
print(f"\nโ Error: {str(e)}")
if __name__ == "__main__":
main()The agent receives a system prompt that explains when to use each tool. LangGraph's ReAct framework handles the reasoning - the agent analyzes the user's query, decides which tools are relevant, calls them in the right order, and synthesizes the results.
Running and testing the agent
Save this code as research_agent.py and run it:
python research_agent.py๐ Research Assistant Agent
==================================================
I can help you search for:
โข General web information with full content
โข Academic research papers
โข GitHub code repositories
โข Recent news articles
Type 'quit' to exit
==================================================
๐ฌ What would you like to research?Tool selection in action
Here's how the agent automatically chooses search strategies based on user intent. Note when you run these, they may take a moment to return results.
Query: "Find recent machine learning frameworks for computer vision"
The agent recognizes this needs code repositories and chooses the code_search tool:
๐ค Searching...
๐ Research Results:
Here are some recent machine learning frameworks for computer vision found on GitHub:
1. OpenCV: An open-source library providing computer vision and machine learning
algorithms for image and video analysis. It also includes Detectron2, a
high-performance object detection framework.
2. 500 AI Machine learning Deep learning Computer vision NLP Projects with code:
This repository contains a list of 500+ AI projects, including machine learning
and computer vision, with code.
3. Lightly: A computer vision framework for self-supervised learning. The repository
also includes Learnergy, energy-based machine learning models built upon PyTorch.Query: "I need academic research papers about neural networks"
The agent switches to academic_research_search for scholarly sources:
๐ค Searching...
๐ Research Results:
Here are some academic research papers on neural networks:
1. Neural Networks Journal | ScienceDirect: The official journal for neural
network research with peer-reviewed articles.
2. Neural Network - an overview | ScienceDirect Topics: Comprehensive overview
of neural networks in the context of social sciences.
3. Overview of artificial neural networks - PubMed: Medical and biological
applications of artificial neural networks.Query: "Show me Python web scraping libraries on GitHub"
The agent uses code_search again, demonstrating ai scraping resource discovery:
๐ค Searching...
๐ Research Results:
Here are some Python web scraping libraries and projects on GitHub:
1. Web Scraping With Python: Tutorial using requests and Beautiful Soup libraries.
2. Python-Web-Scraping-Tutorial by oxylabs: Comprehensive tutorial from simple
examples to complex tasks.
3. Scrapling by D4Vinci: Powerful, flexible Python library for
web scraping.
4. Scrapegraph-ai: Web scraping python library that uses LLM and direct graph
logic to create scraping pipelines.Notice how the agent chooses different tools automatically. You don't need manual configuration because it understands query intent and picks the right search strategy.
Next steps
Search is Firecrawl's entry point into a complete web data API built for agents. Use search to find pages across a curated, fresh index; scrape when you already know the URL; crawl when you need every page on a site; or use the automated data extraction endpoint when you need structured JSON from a specific page. All return clean markdown or structured data in formats shaped for model context windows. At scale, combining Firecrawl search and scrape means your agent doesn't carry unnecessary context โ you spend fewer tokens and get better signal on every call.
From here, you can wire search into an existing app through the MCP server or the CLI, or extend the research agent with your own tools. A free Firecrawl account comes with 1,000 credits per month to try things out.
Frequently Asked Questions
How many credits does a search use?
Two credits per ten results, rounded up. A search returning five results costs two credits; eleven results cost four. Adding `scrape_options` charges the standard scrape cost on top: one credit per scraped page, plus extras for PDF parsing (one credit per PDF page), enhanced proxy mode (four credits per page), or JSON extraction (four credits per page). Two flags keep the bill predictable: pass `parsers: []` inside `scrape_options` to skip PDF parsing when you don't need it, and use `proxy: "auto"` instead of `"enhanced"` so you only pay the four-credit surcharge on pages that actually need enhanced rendering.
Can I get full page content, not just snippets?
Yes. Pass `scrape_options={"formats": ["markdown"]}` to your search call and Firecrawl runs the scrape endpoint on every result, returning the full page body as markdown alongside the title and URL. The same `scrape_options` parameter accepts every flag the scrape endpoint takes, including `html`, `links`, `include_tags`, and `exclude_tags`.
What's the difference between sources and categories?
`sources` picks the result type: `web` for general pages, `news` for recent articles, or `images` for visual content. `categories` filters web results to specific domain groups: `github` for code repos and issues, `research` for academic papers from arXiv, Nature, IEEE, and PubMed, or `pdf` for PDF documents only. You can combine values within each parameter, like `categories=["github", "research"]`.
Can I search for GitHub repos or academic papers?
Yes. Use `categories=["github"]` to scope results to GitHub repositories, code files, and issues. Use `categories=["research"]` to pull from arXiv, Nature, IEEE, and PubMed. Combine both in one call with `categories=["github", "research"]` when you want code and academic backing for the same query.
How do I get results from a specific country?
Use the `location` parameter, which accepts a country name or city/state/country string, like `location="Germany"` or `location="San Francisco,California,United States"`. For ISO codes instead of country names, pass them through the separate `country` parameter (`country="DE"`). You can set both in the same call.
Does tbs work for news and image results?
No. The `tbs` parameter applies only to `web` source results. To get recent news, use `sources=["news"]` instead, which returns timestamped articles by default. Accepted `tbs` values include `qdr:h`, `qdr:d`, `qdr:w`, `qdr:m`, and `qdr:y`, plus custom date ranges via `cdr:1,cd_min:MM/DD/YYYY,cd_max:MM/DD/YYYY`.
What are the rate limits for Firecrawl search API requests?
Rate limits scale with your plan: Free gets 5 requests per minute, Hobby 50, Standard 250, and Growth 2,500. Limits reset automatically every minute. Exceeding your limit returns a 429 error until the next window opens.
What's the difference between SERP APIs and Search APIs?
SERP APIs scrape and reformat data from existing search engines like Google or Bing, acting as middleman services that parse search engine results pages into structured JSON. Examples include SerpAPI, ScrapingDog, and Serper. Search APIs are the broader category that also includes platforms like Firecrawl, which run their own search indices. Every SERP API is a Search API, but not every Search API is a SERP API.
How do AI-native search APIs work differently from traditional search?
Traditional search APIs match keywords and rank results by text similarity and popularity signals. AI-native search APIs like Firecrawl interpret query intent, filter by specialized categories (GitHub, academic, PDF), and extract clean markdown from each result instead of HTML built for human browsing. The output is shaped to drop directly into a model's context window without further parsing.
