

Web Search With Firecrawl
Web search APIs have a fundamental problem. They give you links when you need content. Googleโs Custom Search API returns links, titles and snippets. But when youโre building something that processes web data, those snippets arenโt enough.
The typical workflow forces you into extra steps. First you search. Then you take each URL and scrape it separately. Now youโre dealing with different website structures, bot detection systems, and sites that randomly go offline. What started as a simple search becomes a complex data pipeline.
Firecrawlโs search endpoint solves this by combining both steps. Search for โbest project management tools 2025โ and get back the full content from each result in markdown format. You can filter by location or time, or target specific sources like GitHub or news sites.
This tutorial shows you how to use this search endpoint to get both search results and content extraction in one API call. Weโll start with basic search operations and work up to advanced filtering and content extraction. By the end, youโll know how to build complete search-powered applications.
Search vs. Scrape vs. Crawl: Understanding Firecrawlโs Endpoints
If youโre an existing Firecrawl user, you might be wondering about the differences between search, scrape, and crawl. Each endpoint solves a different problem in web data collection.
Scrape targets individual pages. You provide a specific URL and it extracts clean content from that page. It handles JavaScript rendering, bypasses anti-bot protection, and converts messy HTML into structured data. Use this when you know exactly which page contains the information you need.
Crawl works through entire websites systematically. Give it a starting URL and it discovers every connected page on that site, extracting content as it goes. It follows links, handles pagination, and maps out complete website structures. This works well when you need lots of data from a specific domain.
Search finds content across the entire web. Instead of starting with URLs, you start with queries. It searches the internet for pages matching your topic, then extracts content from the most relevant results. Choose this approach when you need information but donโt know which websites have it.
Feature | Scrape | Crawl | Search |
---|---|---|---|
Input | Specific URL | Starting URL | Search query |
Scope | Single page | Entire website | Entire web |
Discovery | None needed | Site exploration | Web-wide search |
Output | One page content | Multiple related pages | Multiple relevant pages from different sites |
Best for | Known targets | Complete site data | Research and discovery |
The main difference is how you start:
- Scrape and crawl require you to know where your content lives
- Search helps you find content when you donโt know its location
- Scrape gives you precision, crawl gives you completeness, search gives you discovery
Each method has trade-offs. Scrape is fastest but most limited, while crawl takes longer but covers everything on a site. Search can find the most relevant content but may miss some sources or return results you didnโt expect.
In practice, youโll often use multiple endpoints together. You might search to discover relevant sources, then crawl those sites for complex data, or scrape specific high-value pages you found through search. For detailed guides on the other endpoints, see our scrape endpoint tutorial and crawl endpoint guide.
Now letโs look at how to get started with search operations.
Getting Started: Basic Search Operations
Before you can search the web with Firecrawl, you need to set up your API credentials and understand how search results work.
Setting up authentication
First, sign up at firecrawl.dev and grab your API key from the dashboard. Install the Python SDK and set up your credentials:
pip install firecrawl-py python-dotenv
Save your API key in a .env
file (a simple text file that stores environment variables):
echo "FIRECRAWL_API_KEY='fc-YOUR-KEY-HERE'" >> .env
Then load it in your Python code:
from firecrawl import Firecrawl
from dotenv import load_dotenv
load_dotenv()
app = Firecrawl()
Your first search
Hereโs the simplest possible search request:
results = app.search(query="best project management tools", limit=5)
This searches for โbest project management toolsโ and returns the top 5 results. Each search result costs 1 credit, so this request uses 5 credits total.
print(f"Found {len(results.web)} web results")
print(f"Search completed successfully")
Found 5 web results
Search completed successfully
The response comes back with structured data. Each result includes metadata that helps you understand what you found.
Understanding the response structure
Search results come back in a structured format. Each result includes:
# First result details
first_result = results.web[0]
print(f"Title: {first_result.title}")
print(f"URL: {first_result.url}")
print(f"Description: {first_result.description}")
print(f"Category: {first_result.category}")
Title: 25 Best Project Management Software Picked For 2025
URL: https://thedigitalprojectmanager.com/tools/best-project-management-software/
Description: Explore top-rated project management software handpicked by experts to help you manage teams, timelines, and tasks with ease.
Category: None
Now you have clean, structured access to search results. But you might wonder what all these results look like together:
The response includes web results by default. Here are all the results from our search:
All 5 results:
1. 25 Best Project Management Software Picked For 2025
https://thedigitalprojectmanager.com/tools/best-project-management-software/
2. What is the best free project management tool, specifically geared ...
https://www.reddit.com/r/projectmanagement/comments/1b0lfvi/what_is_the_best_free_project_management_tool/
3. Honest Review of 6 Personal Project Management Tools ... - ICAgile
https://www.icagile.com/resources/honest-review-of-6-personal-project-management-tools-with-kanban-view
4. Manage your team's work, projects, & tasks online โข Asana
https://asana.com/
5. Project Management Software for Teams - Microsoft
https://www.microsoft.com/en-us/microsoft-365/planner/project-management
Basic error handling
Search requests can fail for various reasons. Hereโs how to handle common issues:
try:
results = app.search(query="python web scraping", limit=10)
if results.web:
print(f"Found {len(results.web)} results")
else:
print("No results found")
except Exception as e:
print(f"Search failed: {e}")
Testing with an empty query shows how error handling works:
results = app.search(query="", limit=5)
Empty query properly failed: Query cannot be empty
Common error codes include rate limiting (429), invalid queries (400), and authentication issues (401). Most errors include descriptive messages to help you troubleshoot.
Search parameters you can control
The search method accepts several parameters to customize results:
query
- Your search terms (required)limit
- Number of results to return (default: 3, max: 20)sources
- Types of results (โwebโ for general pages, โnewsโ for recent articles, โimagesโ for visual content)location
- Geographic region for localized resultstimeout
- Request timeout in seconds
A search with different sources might look like this:
results = app.search(
query="machine learning frameworks 2025",
limit=3,
sources=["web", "news"],
timeout=30
)
Advanced search found 3 results
1. 8 Best Machine Learning Software To Use in 2025 | Anaconda
2. AI Frameworks: Top Types To Adopt in 2025 - Splunk
3. Uses for ML frameworks like Pytorch/Tensorflow/etc in 2025 - Reddit
Common use cases for basic search
Use the search endpoint when you need to:
- Find recent information on a topic without knowing specific websites
- Research competitors or market trends across multiple sources
- Discover relevant content for further analysis or scraping
- Build datasets by finding pages that match certain criteria
This gives you the foundation for web search operations. In the next section, weโll look at the real power of Firecrawlโs search: extracting full content from these results instead of just getting titles and descriptions.
Search + Content Extraction: The Core Value Proposition
The real power of Firecrawlโs search endpoint is combining search with content extraction in one API call. When you add scrape_options
to a search, Firecrawl automatically runs its scrape endpoint on each search result.
How search + scraping works together
When you add scrape options, Firecrawl finds the most relevant pages for your query, then runs the scrape endpoint on each result:
# This search finds pages, then scrapes each one automatically
content_results = app.search(
query="web scraping best practices",
limit=2,
scrape_options={
"formats": ["markdown", "links"]
}
)
print(f"Content search found {len(content_results.web)} results")
for i, result in enumerate(content_results.web, 1):
print(f"\nResult {i}:")
print(f"Title: {result.metadata.title}")
print(f"URL: {result.metadata.url}")
print(f"Content length: {len(result.markdown)} characters")
print(f"Found {len(result.links)} links in content")
Content search found 2 results
Result 1:
Title: Web Scraping Best Practices and Tools 2025 - ZenRows
URL: https://www.zenrows.com/blog/web-scraping-best-practices
Content length: 25234 characters
Found 51 links in content
Result 2:
Title: 7 Web Scraping Best Practices You Must Be Aware of
URL: https://research.aimultiple.com/web-scraping-best-practices/
Content length: 22384 characters
Found 111 links in content
Instead of getting 100-character descriptions, you get complete articles with 20,000+ characters of content plus all extracted links. This is where Firecrawl shows its real value.
Format options for different applications
The scrape_options
parameter supports the same formats as the scrape endpoint:
results = app.search(
query="python tutorials",
limit=1,
scrape_options={
"formats": ["markdown", "html", "links"]
}
)
# Each result contains full content in multiple formats
result = results.web[0]
print(f"Markdown: {len(result.markdown)} characters")
print(f"HTML: {len(result.html)} characters")
print(f"Links: {len(result.links)} found")
Markdown: 33640 characters
HTML: 221019 characters
Links: 302 found
Notice how the HTML version is much larger (221k characters) than the clean markdown (33k characters), and how many links (302) are extracted from the page structure.
The extracted content is immediately ready for processing. No additional parsing or cleaning needed:
Working with the extracted content
The content is immediately ready for processing:
results = app.search(
query="machine learning research 2025",
limit=2,
scrape_options={"formats": ["markdown"]}
)
for result in results.web:
# Content is already extracted and clean
content = result.markdown
word_count = len(content.split())
print(f"Found article: {result.metadata.title}")
print(f"Content: {word_count} words")
Found article: 2025 Conference
Content: 466 words
Found article: Apple Machine Learning Research at ICML 2025 - Apple Machine Learning Research
Content: 2070 words
The search endpoint with scrape options combines the discovery power of web search with the content extraction capabilities of the scrape endpoint. You find relevant pages and get their full content without managing multiple API calls or handling different website structures yourself.
For more control over content extraction, including options like include_tags
, exclude_tags
, and wait_for
, check out our complete scrape endpoint guide which covers all available scraping parameters.
Targeted Content Discovery
Most search APIs return whatever matches your keywords. You search for โmachine learningโ and get marketing pages mixed with academic papers. Firecrawl lets you target specific content types from the start.
Search categories for specialized content
When youโre building developer tools, generic web results miss the mark. You need code repositories, not blog posts.
The categories
parameter solves this by targeting specialized platforms directly.
# Helper function to display results consistently
def display_results(results, result_type="web", title="Results"):
result_list = getattr(results, result_type, [])
print(f"{title}:")
for result in result_list:
print(f"- {result.title}")
print(f" Category: {getattr(result, 'category', 'N/A')}")
print(f" URL: {result.url}")
# Search GitHub for code projects
github_results = app.search(
query="python web scraping",
limit=3,
categories=["github"]
)
# Search research databases for academic papers
research_results = app.search(
query="machine learning transformers",
limit=3,
categories=["research"]
)
display_results(github_results, title="GitHub category results")
print()
display_results(research_results, title="Research category results")
GitHub category results:
- Scrapy, a fast high-level web crawling & scraping ...
Category: github
URL: https://github.com/scrapy/scrapy
- oxylabs/Python-Web-Scraping-Tutorial
Category: github
URL: https://github.com/oxylabs/Python-Web-Scraping-Tutorial
- ScrapeGraphAI/Scrapegraph-ai: Python scraper based on AI
Category: github
URL: https://github.com/ScrapeGraphAI/Scrapegraph-ai
Research category results:
- [2304.10557] An Introduction to Transformers
Category: research
URL: https://arxiv.org/abs/2304.10557
- A Comprehensive Survey on Applications of Transformers ...
Category: research
URL: https://arxiv.org/abs/2306.07303
- Multimodal Learning With Transformers: A Survey
Category: research
URL: https://pubmed.ncbi.nlm.nih.gov/37167049/
Firecrawl gives your search the right context. GitHub category returns actual repositories and code projects. Research category finds academic papers from arXiv, PubMed, and scholarly databases.
Source type filtering for different content formats
Beyond categories, you often need different content formats for the same topic. News articles give you recent developments, while web pages offer educational guides. The sources
parameter controls this.
# Helper function for source comparison
def compare_sources(query, limit=2):
web_results = app.search(query=query, limit=limit, sources=["web"])
news_results = app.search(query=query, limit=limit, sources=["news"])
print("Web source results:")
for result in web_results.web:
domain = result.url.split('/')[2] if '/' in result.url else result.url
print(f"- {result.title}")
print(f" Domain: {domain}")
print("\nNews source results:")
if hasattr(news_results, 'news') and news_results.news:
for result in news_results.news:
print(f"- {result.title}")
print(f" Date: {getattr(result, 'date', 'N/A')}")
compare_sources("AI developments 2025")
Web source results:
- The 2025 AI Index Report | Stanford HAI
Domain: hai.stanford.edu
- The Latest AI News and AI Breakthroughs that Matter Most: 2025
Domain: www.crescendo.ai
News source results:
- Microsoft's Unprecedented 2025: Cloud and AI Fuel Record Earnings, Igniting Shareholder Optimism
Date: 12 hours ago
- 12 Graphs That Explain the State of AI in 2025
Date: Apr 7, 2025
Web sources return reports from authoritative domains and the news sources provide recent articles with timestamps. This distinction matters when you need current market developments versus foundational research.
Visual Content Search
Text search is straightforward. Image search gets complex fast. You need specific resolutions, aspect ratios, or image types. Firecrawlโs image search includes filtering options that traditional search APIs donโt have.
Basic image search
Start with sources=["images"]
for visual content:
# Basic image search
basic_images = app.search(
query="mountain landscape",
limit=3,
sources=["images"]
)
print("Basic image search results:")
if hasattr(basic_images, 'images') and basic_images.images:
print(f"Found {len(basic_images.images)} images")
for i, img in enumerate(basic_images.images, 1):
print(f"{i}. {img.title}")
print(f" Source page: {img.url}")
print(f" Position: {img.position}")
Basic image search results:
Found 3 images
1. Free Sunset Mountain Landscape Image | Download at StockCake
Source page: https://stockcake.com/i/sunset-mountain-landscape_1219985_159231
Position: 1
2. Watercolor Mountain Landscape Tutorial: Step-by-Step Guide
Source page: https://www.esperoart.com/mountain-landscape-watercolor-tutorial/
Position: 2
3. Mountain Landscape with Forests and a Stream ยท Free Stock Photo
Source page: https://www.pexels.com/photo/mountain-landscape-with-forests-and-a-stream-18682254/
Position: 3
This gives you relevant visual content. But many applications need specific image dimensions.
HD and 4K image search with size operators
Computer vision projects require consistent input dimensions. Machine learning models trained on specific image sizes need datasets with matching resolutions. Firecrawlโs image search supports Googleโs established image size operators, letting you filter results by exact dimensions or minimum sizes.
These operators work within your search query string to target specific resolutions. Instead of getting random image sizes that require post-processing, you get images that match your requirements from the start.
Hereโs an example:
def search_images_by_resolution():
# Search for Full HD images
hd_images = app.search(
query="sunset landscape imagesize:1920x1080",
limit=3,
sources=["images"]
)
print("HD image search results:")
if hasattr(hd_images, 'images') and hd_images.images:
for i, img in enumerate(hd_images.images, 1):
print(f"{i}. {img.title}")
print(f" Source: {img.url}")
# Search for 4K images
uhd_results = app.search(
query="nature wallpaper imagesize:3840x2160",
limit=2,
sources=["images"]
)
print("\n4K image search results:")
if hasattr(uhd_results, 'images') and uhd_results.images:
for img in uhd_results.images:
print(f"- {img.title}")
print(f" Source: {img.url}")
search_images_by_resolution()
HD image search results:
1. Stunning sunset landscape: Mediterranean rocks with olive trees in the sea. Photorealistic motion background. 3D rendering.
Source: https://www.storyblocks.com/video/stock/photorealistic-motion-background-mediteranean-rocks-in-the-sea-with-olive-trees-growing-on-them-beautiful-sunset-landscape-scene-3d-rendering-bymitreh8kaz2db71
2. Sunset in a hilly rural landscape on the edge of a forest, Timelapse video
Source: https://www.storyblocks.com/video/stock/sunset-in-a-hilly-rural-landscape-on-the-edge-of-a-forest-timelapse-video-351768226
4K image search results:
- 50 Nature Wallpapers (All 4K, No watermarks) - Album on Imgur
Source: https://in.pinterest.com/pin/1002543567023869780/
- My top 10 of all time - Nature and Art : r/wallpapers
Source: https://www.reddit.com/r/wallpapers/comments/1g1cymc/my_top_10_of_all_time_nature_and_art/
Common resolution operators for different use cases:
imagesize:1920x1080
- Full HD for standard displaysimagesize:2560x1440
- QHD for high-resolution screensimagesize:3840x2160
- 4K UHD for professional applicationslarger:1920x1080
- HD and above for flexible sizing
Advanced Filtering & Strategy Selection
Basic search gets you started, but real world applications need precise targeting. You might need recent news instead of evergreen content, or results from multiple specialized sources combined.
Time-based filtering for current versus evergreen content
Web search results span decades of content, but applications often need recent information versus historical data. The tbs
parameter filters results by publication date using Googleโs time-based search codes. These abbreviated codes like qdr:d (past day), qdr:w (past week), and qdr:m (past month) let you separate trending topics from foundational concepts. Market research needs last monthโs data, while educational content remains valuable regardless of age.
Hereโs how the tbs
parameter filters results by recency:
def compare_content_freshness(base_query):
recent_results = app.search(
query=f"{base_query} trends",
limit=3,
tbs="qdr:m" # Past month
)
all_time_results = app.search(
query=f"{base_query} fundamentals",
limit=3
)
print("Recent results (past month):")
for result in recent_results.web:
print(f"- {result.title}")
print("\nAll-time results (fundamentals):")
for result in all_time_results.web:
print(f"- {result.title}")
compare_content_freshness("machine learning")
Recent results (past month):
- Top 13 Machine Learning Trends CTOs Need to Know in 2025
- The Evolution of AI and ML: Trends, Impact, and Future Insights
- The Ultimate List of Machine Learning Statistics for 2025 - Itransition
All-time results (fundamentals):
- Introduction to Machine Learning Concepts - Training - Microsoft Learn
- Machine Learning Crash Course - Google for Developers
- Machine Learning Tutorial - GeeksforGeeks
Time filtering with qdr:m
returns current trend analysis. Searches without time filters return educational resources and established documentation.
Combining multiple categories for complete coverage
Complex applications often need content from multiple specialized sources. Research tools might combine academic papers with code implementations. You can search across different specialized sources at once:
multi_category_results = app.search(
query="neural networks",
limit=4,
categories=["github", "research"]
)
print("Combined GitHub + Research results:")
for result in multi_category_results.web:
print(f"- {result.title}")
print(f" Category: {getattr(result, 'category', 'N/A')}")
print(f" URL: {result.url}")
Combined GitHub + Research results:
- Neural Networks | Journal | ScienceDirect.com by Elsevier
Category: research
URL: https://www.sciencedirect.com/journal/neural-networks
- Neural Network - an overview | ScienceDirect Topics
Category: research
URL: https://www.sciencedirect.com/topics/social-sciences/neural-network
- Neural Network - an overview
Category: research
URL: https://www.sciencedirect.com/topics/computer-science/neural-network
- Supervised learning in DNA neural networks
Category: research
URL: https://www.nature.com/articles/s41586-025-09479-w
Choosing the right search strategy
Match your search parameters to your content requirements:
- Categories: Use
github
when you need code implementations,research
for academic backing - Sources: Use
news
for market trends,web
for comprehensive guides,images
for visual content - Time filters: Apply
qdr:d
(day),qdr:w
(week),qdr:m
(month) when current information matters - Image operators: Use size filters when your models require specific input dimensions
- Multiple categories: Combine approaches when you need diverse source types
Building an AI Research Agent: LangGraph + Firecrawl Integration
Most search applications follow the same pattern: user asks question, system returns links, user manually evaluates results. This works for simple queries, but breaks down when you need comprehensive research across different content types. Should you search GitHub for code examples? Academic databases for research papers? News sources for recent developments?
Why agents beat simple scripts for search applications
Consider researching โmachine learning for healthcare.โ A simple script might search Google and return general articles. But comprehensive research needs:
- Academic papers from PubMed or arXiv for scientific backing
- GitHub repositories for implementation examples
- Recent news for current developments and regulations
- Web articles for tutorials and practical guides
An intelligent agent can automatically choose the right search strategy based on query intent, combine results from multiple sources, and handle errors gracefully. Instead of writing separate functions for each search type, you build one system that reasons about what information the user actually needs, so itโs great as a search API for AI agents and RAG applications.
LangGraph is an AI framework that augments web search APIs into intelligent, reasoning-capable systems. Unlike simple scripts that always follow the same steps, LangGraph agents analyze user queries and automatically choose appropriate search strategies. The ReAct (Reasoning and Acting) framework enables agents to think through problems, decide which search tools are relevant, execute those searches, and synthesize results into coherent responses.
What weโre building: a multi-tool research agent
Our research agent uses four specialized search tools, each targeting different aspects of Firecrawlโs search endpoint. LangGraph provides the reasoning layer that decides which tools to use based on user intent. The agent can:
- Automatically pick between web, academic, code, and news searches
- Combine multiple search types for comprehensive research
- Handle API errors without breaking the user experience
- Remember context across different questions in a conversation
This demonstrates how web search APIs for AI applications can power sophisticated, context-aware systems that adapt to user needs.
When NOT to use this approach
Agents add complexity and cost. Use simple search functions instead when:
- You always need the same search type (like only GitHub repositories)
- Response time is more important than search quality
- Youโre building a prototype or MVP
- Your use case doesnโt require intelligent tool selection
For complex research workflows, competitive intelligence, or AI training data collection, the agent approach provides significant value.
Setting up the development environment
Before building the agent, you need API keys and dependencies. This agent requires both OpenAI (for reasoning) and Firecrawl (for searching).
Get your API keys:
Add both keys to your environment file:
echo "FIRECRAWL_API_KEY='fc-YOUR-FIRECRAWL-KEY'" >> .env
echo "OPENAI_API_KEY='sk-YOUR-OPENAI-KEY'" >> .env
You can install dependencies using uv (recommended for faster, more reliable package management):
# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh
source ~/.bashrc # or ~/.zshrc
# uv expects to work in a Python project directory with a pyproject.toml file
uv init
# Install project dependencies
uv add firecrawl-py python-dotenv langchain-openai langgraph
If you prefer pip:
pip install firecrawl-py python-dotenv langchain-openai langgraph
Creating the specialized search tools
Start by building four tools that target different aspects of Firecrawlโs search endpoint. Each tool handles a specific content type but follows the same pattern:
#!/usr/bin/env python3
"""Research Assistant Agent using LangGraph and Firecrawl Search API"""
import os
from dotenv import load_dotenv
from firecrawl import Firecrawl
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver
# Load environment variables
load_dotenv()
firecrawl = Firecrawl()
def general_web_search(query: str, limit: int = 3) -> str:
"""Search the web for general information and extract full content."""
try:
results = firecrawl.search(
query=query,
limit=limit,
sources=["web"],
scrape_options={"formats": ["markdown"]}
)
if not results.web:
return f"No web results found for: {query}"
formatted_results = f"Web Search Results for '{query}':\n\n"
for i, result in enumerate(results.web, 1):
title = result.metadata.title if hasattr(result, 'metadata') else result.title
url = result.metadata.url if hasattr(result, 'metadata') else result.url
content_preview = result.markdown[:300] + "..." if len(result.markdown) > 300 else result.markdown
formatted_results += f"{i}. {title}\n"
formatted_results += f" URL: {url}\n"
formatted_results += f" Content: {content_preview}\n\n"
return formatted_results
except Exception as e:
return f"Error in web search: {str(e)}"
With Firecrawl as the search API, you get real-time search API capabilities combined with content extraction. The scrape_options
parameter extracts complete page content, not just titles and descriptions. You get full articles ready for processing.
Add the specialized category searches:
def academic_research_search(topic: str, limit: int = 3) -> str:
"""Search for academic research papers and scholarly content."""
try:
results = firecrawl.search(
query=topic,
limit=limit,
categories=["research"],
scrape_options={"formats": ["markdown"]}
)
if not results.web:
return f"No research papers found for: {topic}"
formatted_results = f"Academic Research Results for '{topic}':\n\n"
for i, result in enumerate(results.web, 1):
title = result.metadata.title if hasattr(result, 'metadata') else result.title
url = result.metadata.url if hasattr(result, 'metadata') else result.url
category = getattr(result, 'category', 'research')
formatted_results += f"{i}. {title}\n"
formatted_results += f" Category: {category}\n"
formatted_results += f" URL: {url}\n\n"
return formatted_results
except Exception as e:
return f"Error in research search: {str(e)}"
def code_search(technology: str, limit: int = 3) -> str:
"""Search GitHub repositories for code examples and projects."""
try:
results = firecrawl.search(
query=technology,
limit=limit,
categories=["github"]
)
if not results.web:
return f"No GitHub repositories found for: {technology}"
formatted_results = f"GitHub Code Search Results for '{technology}':\n\n"
for i, result in enumerate(results.web, 1):
formatted_results += f"{i}. {result.title}\n"
formatted_results += f" URL: {result.url}\n"
formatted_results += f" Description: {result.description or 'No description available'}\n\n"
return formatted_results
except Exception as e:
return f"Error in code search: {str(e)}"
def news_search(topic: str, time_filter: str = "qdr:w", limit: int = 3) -> str:
"""Search for recent news articles on a topic."""
try:
results = firecrawl.search(
query=topic,
limit=limit,
sources=["news"],
tbs=time_filter
)
news_results = []
if hasattr(results, 'news') and results.news:
news_results = results.news
elif hasattr(results, 'web') and results.web:
news_results = results.web
if not news_results:
return f"No recent news found for: {topic}"
time_desc = {"qdr:d": "past day", "qdr:w": "past week", "qdr:m": "past month"}.get(time_filter, "recent")
formatted_results = f"News Search Results for '{topic}' ({time_desc}):\n\n"
for i, result in enumerate(news_results, 1):
formatted_results += f"{i}. {result.title}\n"
formatted_results += f" URL: {result.url}\n"
formatted_results += f" Date: {getattr(result, 'date', 'Date not available')}\n\n"
return formatted_results
except Exception as e:
return f"Error in news search: {str(e)}"
These four tools create a complete search ecosystem. Each targets different content types, but the real power comes from intelligent selection. The agent analyzes user intent and picks the right tools automatically.
Building the LangGraph reasoning layer
Now connect these tools with LangGraphโs reasoning framework. LangGraph handles the complex decision-making about which tools to use and when. For more details on LangGraph patterns and advanced agent architectures, see our comprehensive LangGraph tutorial.
def create_research_agent():
"""Create the research assistant agent with search tools."""
model = ChatOpenAI(model="gpt-5", temperature=0)
tools = [general_web_search, academic_research_search, code_search, news_search]
agent = create_react_agent(
model=model,
tools=tools,
checkpointer=MemorySaver()
)
return agent
The agent receives a system prompt that explains when to use each tool. LangGraphโs ReAct framework handles the reasoning process.
def main():
"""Main function to run the research agent."""
print("๐ Research Assistant Agent")
print("=" * 50)
print("I can help you search for:")
print("โข General web information with full content")
print("โข Academic research papers")
print("โข GitHub code repositories")
print("โข Recent news articles")
print("\nType 'quit' to exit")
print("=" * 50)
agent = create_research_agent()
while True:
user_input = input("\n๐ฌ What would you like to research? ")
if user_input.lower() in ['quit', 'exit', 'q']:
print("๐ Goodbye!")
break
if not user_input.strip():
continue
try:
print("\n๐ค Searching...")
system_message = """You are a Research Assistant Agent that helps users find information on any topic.
You have access to four specialized search tools:
1. general_web_search - For web search with full content extraction
2. academic_research_search - For finding academic papers and scholarly content
3. code_search - For finding GitHub repositories and code examples
4. news_search - For finding recent news articles
Choose the appropriate tool(s) based on the user's request. You can use multiple tools for extensive research."""
response = agent.invoke(
{"messages": [("system", system_message), ("user", user_input)]},
config={"configurable": {"thread_id": "research_session"}}
)
final_message = response["messages"][-1]
print(f"\n๐ Research Results:\n{final_message.content}")
except Exception as e:
print(f"\nโ Error: {str(e)}")
if __name__ == "__main__":
main()
The agent receives a system prompt that explains when to use each tool. LangGraphโs ReAct framework handles the reasoning - the agent analyzes the userโs query, decides which tools are relevant, calls them in the right order, and synthesizes the results.
Running and testing the agent
Save this code as research_agent.py
and run it:
python research_agent.py
๐ Research Assistant Agent
==================================================
I can help you search for:
โข General web information with full content
โข Academic research papers
โข GitHub code repositories
โข Recent news articles
Type 'quit' to exit
==================================================
๐ฌ What would you like to research?
Intelligent tool selection in action
Hereโs how the agent automatically chooses search strategies based on user intent. Note when you run these, they may take a moment to return results.
Query: โFind recent machine learning frameworks for computer visionโ
The agent recognizes this needs code repositories and chooses the code_search
tool:
๐ค Searching...
๐ Research Results:
Here are some recent machine learning frameworks for computer vision found on GitHub:
1. OpenCV: An open-source library providing computer vision and machine learning
algorithms for image and video analysis. It also includes Detectron2, a
high-performance object detection framework.
2. 500 AI Machine learning Deep learning Computer vision NLP Projects with code:
This repository contains a list of 500+ AI projects, including machine learning
and computer vision, with code.
3. Lightly: A computer vision framework for self-supervised learning. The repository
also includes Learnergy, energy-based machine learning models built upon PyTorch.
Query: โI need academic research papers about neural networksโ
The agent switches to academic_research_search
for scholarly sources:
๐ค Searching...
๐ Research Results:
Here are some academic research papers on neural networks:
1. Neural Networks Journal | ScienceDirect: The official journal for neural
network research with peer-reviewed articles.
2. Neural Network - an overview | ScienceDirect Topics: Comprehensive overview
of neural networks in the context of social sciences.
3. Overview of artificial neural networks - PubMed: Medical and biological
applications of artificial neural networks.
Query: โShow me Python web scraping libraries on GitHubโ
The agent uses code_search
again, demonstrating ai scraping resource discovery:
๐ค Searching...
๐ Research Results:
Here are some Python web scraping libraries and projects on GitHub:
1. Web Scraping With Python: Tutorial using requests and Beautiful Soup libraries.
2. Python-Web-Scraping-Tutorial by oxylabs: Comprehensive tutorial from simple
examples to complex tasks.
3. Scrapling by D4Vinci: Undetectable, powerful, flexible Python library for
web scraping.
4. Scrapegraph-ai: Web scraping python library that uses LLM and direct graph
logic to create scraping pipelines.
Notice how the agent chooses different tools automatically. You donโt need manual configuration because it understands query intent and picks the right search strategy.
Production improvements for scaling
This research agent demonstrates core AI search functionality, but production applications have additional considerations:
Error handling and resilience: Implement retry logic with exponential backoff for API failures. Add fallback strategies when specific search types fail. Log errors for monitoring and debugging. Set up health checks to detect service issues early.
Performance and cost optimization: Cache frequently requested searches to reduce API calls. Implement rate limiting to prevent quota exhaustion. Monitor credit usage and set daily budgets. Use result pagination for large datasets instead of increasing limits.
User experience improvements: Add streaming responses so users see results as they come in:
async def stream_agent_response(agent, query):
"""Stream agent responses for real-time feedback."""
async for chunk in agent.astream(
{"messages": [("user", query)]},
config={"configurable": {"thread_id": "user_session"}}
):
if "messages" in chunk:
yield chunk["messages"][-1].content
Context and memory management: Store conversation history in persistent storage. Implement session management for multiple users. Add context awareness so agents remember previous searches and can build on them.
Integration patterns: Connect with vector databases for semantic search capabilities. Add webhooks for real-time data updates. Integrate with workflow systems for automated research pipelines. Build APIs so other applications can use your agent.
Monitoring and analytics: Track which search types get used most. Monitor response times and success rates. Analyze user query patterns to improve tool selection. Set up alerts for system health and performance issues.
Security and compliance: Validate and sanitize user inputs. Implement authentication and authorization. Add audit logs for compliance requirements. Set up data retention policies for search results.
This agent architecture shows how web search API for AI applications can power sophisticated, context-aware systems. By combining Firecrawlโs search capabilities with LangGraphโs reasoning framework, you get searching for agents that automatically choose optimal search strategies and provide thorough research results without manual intervention.
Searching and Beyond
Most web search APIs give you links when you need content. You search, get URLs, then scrape each page separately. Thatโs two API calls, double the complexity, and higher costs. Firecrawlโs search endpoint skips this workflow by extracting content during the search itself. One call gets you clean, processed text ready for your application.
This tutorial covered search capabilities from basic queries to advanced filtering with categories, sources, and time-based parameters. We built a research agent that automatically chooses between web, academic, code, and news searches.
With Firecrawl, you can combine search for discovery, scrape for precision, and crawl for complete site coverage to build AI-native web search applications. Get started with a free Firecrawl account and 500 credits.
To learn more about other Firecrawl endpoints, check out our scrape endpoint tutorial and crawl endpoint guide.
Frequently Asked Questions
Whatโs the difference between SERP APIs and Search APIs?
SERP APIs specifically scrape and reformat data from existing search engines like Google or Bing. They act as middleman services that parse search engine results pages and return structured JSON data. Examples include SerpAPI, ScrapingDog, and Serper.
Search APIs are the broader category that includes both SERP scrapers and independent search engines, like Firecrawlโs search endpoint. This encompasses SERP APIs plus platforms that built their own search indices and algorithms.
All SERP APIs are Search APIs, but not all Search APIs are SERP APIs.
How do AI-native search APIs work differently from traditional search?
Traditional search APIs match keywords and return results based on text similarity and popularity signals. AI-native search APIs like Firecrawl understand query intent and can extract full content from results, not just metadata. They can also filter by specialized categories like GitHub repositories or academic papers, and format results as clean markdown ready for AI processing rather than HTML meant for human browsing.
How does content extraction work with search results?
When you add scrape_options to a search request, Firecrawl automatically extracts full page content from each result. This includes converting HTML to clean markdown, extracting all links, and handling JavaScript-rendered content. You get complete articles instead of just titles and snippets.
How much does Firecrawlโs search endpoint cost?
Each search result costs 1 credit regardless of whether you extract content. Basic scraping is included at no extra charge. Additional features like PDF parsing (+1 credit per page), stealth proxy mode (+4 credits), and structured JSON extraction (+5 credits) incur extra costs. Plans start at $16/month for 1,000 credits.
What are the rate limits for Firecrawl search API requests?
Search rate limits vary by plan:
- Free: 5 requests/minute
- Hobby: 50 requests/minute
- Standard: 250 requests/minute
- Growth: 2,500 requests/minute
These limits apply per minute and reset automatically. If you exceed your limit, requests will return a rate limit error until the next minute window.

data from the web