Mastering Firecrawl Search Endpoint: Web Search and Data Extraction in One API Call

Introducing Firecrawl v2.5 - The world's best web data API. Read the blog.

Get started

Ready to build?

Start getting Web Data for free and scale seamlessly as your project expands. No credit card needed.

Bex Tuychiev

Sep 19, 2025

Mastering Firecrawl Search Endpoint: Web Search and Data Extraction in One API Call image

Web Search With Firecrawl

Web search APIs have a fundamental problem. They give you links when you need content. Google’s Custom Search API returns links, titles and snippets. But when you’re building something that processes web data, those snippets aren’t enough.

The typical workflow forces you into extra steps. First you search. Then you take each URL and scrape it separately. Now you’re dealing with different website structures, bot detection systems, and sites that randomly go offline. What started as a simple search becomes a complex data pipeline.

Firecrawl’s search endpoint solves this by combining both steps. Search for “best project management tools 2025” and get back the full content from each result in markdown format. You can filter by location or time, or target specific sources like GitHub or news sites.

This tutorial shows you how to use this search endpoint to get both search results and content extraction in one API call. We’ll start with basic search operations and work up to advanced filtering and content extraction. By the end, you’ll know how to build complete search-powered applications.

Search vs. Scrape vs. Crawl: Understanding Firecrawl’s Endpoints

If you’re an existing Firecrawl user, you might be wondering about the differences between search, scrape, and crawl. Each endpoint solves a different problem in web data collection.

Scrape targets individual pages. You provide a specific URL and it extracts clean content from that page. It handles JavaScript rendering, bypasses anti-bot protection, and converts messy HTML into structured data. Use this when you know exactly which page contains the information you need.

Crawl works through entire websites systematically. Give it a starting URL and it discovers every connected page on that site, extracting content as it goes. It follows links, handles pagination, and maps out complete website structures. This works well when you need lots of data from a specific domain.

Search finds content across the entire web. Instead of starting with URLs, you start with queries. It searches the internet for pages matching your topic, then extracts content from the most relevant results. Choose this approach when you need information but don’t know which websites have it.

Feature	Scrape	Crawl	Search
Input	Specific URL	Starting URL	Search query
Scope	Single page	Entire website	Entire web
Discovery	None needed	Site exploration	Web-wide search
Output	One page content	Multiple related pages	Multiple relevant pages from different sites
Best for	Known targets	Complete site data	Research and discovery

The main difference is how you start:

Scrape and crawl require you to know where your content lives
Search helps you find content when you don’t know its location
Scrape gives you precision, crawl gives you completeness, search gives you discovery

Each method has trade-offs. Scrape is fastest but most limited, while crawl takes longer but covers everything on a site. Search can find the most relevant content but may miss some sources or return results you didn’t expect.

In practice, you’ll often use multiple endpoints together. You might search to discover relevant sources, then crawl those sites for complex data, or scrape specific high-value pages you found through search. For detailed guides on the other endpoints, see our scrape endpoint tutorial and crawl endpoint guide.

Now let’s look at how to get started with search operations.

Getting Started: Basic Search Operations

Before you can search the web with Firecrawl, you need to set up your API credentials and understand how search results work.

Setting up authentication

First, sign up at firecrawl.dev and grab your API key from the dashboard. Install the Python SDK and set up your credentials:

pip install firecrawl-py python-dotenv

Save your API key in a .env file (a simple text file that stores environment variables):

echo "FIRECRAWL_API_KEY='fc-YOUR-KEY-HERE'" >> .env

Then load it in your Python code:

from firecrawl import Firecrawl
from dotenv import load_dotenv

load_dotenv()
app = Firecrawl()

Your first search

Here’s the simplest possible search request:

results = app.search(query="best project management tools", limit=5)

This searches for “best project management tools” and returns the top 5 results. When search results are not scraped, the cost is 2 credits per 10 search results, so this request uses 2 credits total.

print(f"Found {len(results.web)} web results")
print(f"Search completed successfully")

Found 5 web results
Search completed successfully

The response comes back with structured data. Each result includes metadata that helps you understand what you found.

Understanding the response structure

Search results come back in a structured format. Each result includes:

# First result details
first_result = results.web[0]
print(f"Title: {first_result.title}")
print(f"URL: {first_result.url}")
print(f"Description: {first_result.description}")
print(f"Category: {first_result.category}")

Title: 25 Best Project Management Software Picked For 2025
URL: https://thedigitalprojectmanager.com/tools/best-project-management-software/
Description: Explore top-rated project management software handpicked by experts to help you manage teams, timelines, and tasks with ease.
Category: None

Now you have clean, structured access to search results. But you might wonder what all these results look like together:

The response includes web results by default. Here are all the results from our search:

All 5 results:
1. 25 Best Project Management Software Picked For 2025
   https://thedigitalprojectmanager.com/tools/best-project-management-software/

2. What is the best free project management tool, specifically geared ...
   https://www.reddit.com/r/projectmanagement/comments/1b0lfvi/what_is_the_best_free_project_management_tool/

3. Honest Review of 6 Personal Project Management Tools ... - ICAgile
   https://www.icagile.com/resources/honest-review-of-6-personal-project-management-tools-with-kanban-view

4. Manage your team's work, projects, & tasks online • Asana
   https://asana.com/

5. Project Management Software for Teams - Microsoft
   https://www.microsoft.com/en-us/microsoft-365/planner/project-management

Basic error handling

Search requests can fail for various reasons. Here’s how to handle common issues:

try:
    results = app.search(query="python web scraping", limit=10)
    if results.web:
        print(f"Found {len(results.web)} results")
    else:
        print("No results found")
except Exception as e:
    print(f"Search failed: {e}")

Testing with an empty query shows how error handling works:

results = app.search(query="", limit=5)

Empty query properly failed: Query cannot be empty

Common error codes include rate limiting (429), invalid queries (400), and authentication issues (401). Most errors include descriptive messages to help you troubleshoot.

Search parameters you can control

The search method accepts several parameters to customize results:

query - Your search terms (required)
limit - Number of results to return (default: 3, max: 20)
sources - Types of results (“web” for general pages, “news” for recent articles, “images” for visual content)
location - Geographic region for localized results
timeout - Request timeout in seconds

A search with different sources might look like this:

results = app.search(
    query="machine learning frameworks 2025",
    limit=3,
    sources=["web", "news"],
    timeout=30
)

Advanced search found 3 results
1. 8 Best Machine Learning Software To Use in 2025 | Anaconda
2. AI Frameworks: Top Types To Adopt in 2025 - Splunk
3. Uses for ML frameworks like Pytorch/Tensorflow/etc in 2025 - Reddit

Common use cases for basic search

Use the search endpoint when you need to:

Find recent information on a topic without knowing specific websites
Research competitors or market trends across multiple sources
Discover relevant content for further analysis or scraping
Build datasets by finding pages that match certain criteria

This gives you the foundation for web search operations. In the next section, we’ll look at the real power of Firecrawl’s search: extracting full content from these results instead of just getting titles and descriptions.

Search + Content Extraction: The Core Value Proposition

The real power of Firecrawl’s search endpoint is combining search with content extraction in one API call. When you add scrape_options to a search, Firecrawl automatically runs its scrape endpoint on each search result.

How search + scraping works together

When you add scrape options, Firecrawl finds the most relevant pages for your query, then runs the scrape endpoint on each result:

# This search finds pages, then scrapes each one automatically
content_results = app.search(
    query="web scraping best practices", 
    limit=2,
    scrape_options={
        "formats": ["markdown", "links"]
    }
)

print(f"Content search found {len(content_results.web)} results")

for i, result in enumerate(content_results.web, 1):
    print(f"\nResult {i}:")
    print(f"Title: {result.metadata.title}")
    print(f"URL: {result.metadata.url}")
    print(f"Content length: {len(result.markdown)} characters")
    print(f"Found {len(result.links)} links in content")

Content search found 2 results

Result 1:
Title: Web Scraping Best Practices and Tools 2025 - ZenRows
URL: https://www.zenrows.com/blog/web-scraping-best-practices
Content length: 25234 characters
Found 51 links in content

Result 2:
Title: 7 Web Scraping Best Practices You Must Be Aware of
URL: https://research.aimultiple.com/web-scraping-best-practices/
Content length: 22384 characters
Found 111 links in content

Instead of getting 100-character descriptions, you get complete articles with 20,000+ characters of content plus all extracted links. This is where Firecrawl shows its real value.

Format options for different applications

The scrape_options parameter supports the same formats as the scrape endpoint:

results = app.search(
    query="python tutorials",
    limit=1,
    scrape_options={
        "formats": ["markdown", "html", "links"]
    }
)

# Each result contains full content in multiple formats
result = results.web[0]
print(f"Markdown: {len(result.markdown)} characters")
print(f"HTML: {len(result.html)} characters") 
print(f"Links: {len(result.links)} found")

Markdown: 33640 characters
HTML: 221019 characters
Links: 302 found

Notice how the HTML version is much larger (221k characters) than the clean markdown (33k characters), and how many links (302) are extracted from the page structure.

The extracted content is immediately ready for processing. No additional parsing or cleaning needed:

Working with the extracted content

The content is immediately ready for processing:

results = app.search(
    query="machine learning research 2025",
    limit=2,
    scrape_options={"formats": ["markdown"]}
)

for result in results.web:
    # Content is already extracted and clean
    content = result.markdown
    word_count = len(content.split())
    
    print(f"Found article: {result.metadata.title}")
    print(f"Content: {word_count} words")

Found article: 2025 Conference
Content: 466 words
Found article: Apple Machine Learning Research at ICML 2025 - Apple Machine Learning Research
Content: 2070 words

The search endpoint with scrape options combines the discovery power of web search with the content extraction capabilities of the scrape endpoint. You find relevant pages and get their full content without managing multiple API calls or handling different website structures yourself.

For more control over content extraction, including options like include_tags, exclude_tags, and wait_for, check out our complete scrape endpoint guide which covers all available scraping parameters.

Targeted Content Discovery

Most search APIs return whatever matches your keywords. You search for “machine learning” and get marketing pages mixed with academic papers. Firecrawl lets you target specific content types from the start.

Search categories for specialized content

When you’re building developer tools, generic web results miss the mark. You need code repositories, not blog posts.

The categories parameter solves this by targeting specialized platforms directly.

# Helper function to display results consistently
def display_results(results, result_type="web", title="Results"):
    result_list = getattr(results, result_type, [])
    print(f"{title}:")
    for result in result_list:
        print(f"- {result.title}")
        print(f"  Category: {getattr(result, 'category', 'N/A')}")
        print(f"  URL: {result.url}")

# Search GitHub for code projects
github_results = app.search(
    query="python web scraping",
    limit=3,
    categories=["github"]
)

# Search research databases for academic papers
research_results = app.search(
    query="machine learning transformers",
    limit=3,
    categories=["research"]
)

display_results(github_results, title="GitHub category results")
print()
display_results(research_results, title="Research category results")

GitHub category results:
- Scrapy, a fast high-level web crawling & scraping ...
  Category: github
  URL: https://github.com/scrapy/scrapy
- oxylabs/Python-Web-Scraping-Tutorial
  Category: github
  URL: https://github.com/oxylabs/Python-Web-Scraping-Tutorial
- ScrapeGraphAI/Scrapegraph-ai: Python scraper based on AI
  Category: github
  URL: https://github.com/ScrapeGraphAI/Scrapegraph-ai

Research category results:
- [2304.10557] An Introduction to Transformers
  Category: research
  URL: https://arxiv.org/abs/2304.10557
- A Comprehensive Survey on Applications of Transformers ...
  Category: research
  URL: https://arxiv.org/abs/2306.07303
- Multimodal Learning With Transformers: A Survey
  Category: research
  URL: https://pubmed.ncbi.nlm.nih.gov/37167049/

Firecrawl gives your search the right context. GitHub category returns actual repositories and code projects. Research category finds academic papers from arXiv, PubMed, and scholarly databases.

Source type filtering for different content formats

Beyond categories, you often need different content formats for the same topic. News articles give you recent developments, while web pages offer educational guides. The sources parameter controls this.

# Helper function for source comparison
def compare_sources(query, limit=2):
    web_results = app.search(query=query, limit=limit, sources=["web"])
    news_results = app.search(query=query, limit=limit, sources=["news"])
    
    print("Web source results:")
    for result in web_results.web:
        domain = result.url.split('/')[2] if '/' in result.url else result.url
        print(f"- {result.title}")
        print(f"  Domain: {domain}")
    
    print("\nNews source results:")
    if hasattr(news_results, 'news') and news_results.news:
        for result in news_results.news:
            print(f"- {result.title}")
            print(f"  Date: {getattr(result, 'date', 'N/A')}")

compare_sources("AI developments 2025")

Web source results:
- The 2025 AI Index Report | Stanford HAI
  Domain: hai.stanford.edu
- The Latest AI News and AI Breakthroughs that Matter Most: 2025
  Domain: www.crescendo.ai

News source results:
- Microsoft's Unprecedented 2025: Cloud and AI Fuel Record Earnings, Igniting Shareholder Optimism
  Date: 12 hours ago
- 12 Graphs That Explain the State of AI in 2025
  Date: Apr 7, 2025

Web sources return reports from authoritative domains and the news sources provide recent articles with timestamps. This distinction matters when you need current market developments versus foundational research.

Visual Content Search

Text search is straightforward. Image search gets complex fast. You need specific resolutions, aspect ratios, or image types. Firecrawl’s image search includes filtering options that traditional search APIs don’t have.

Basic image search

Start with sources=["images"] for visual content:

# Basic image search
basic_images = app.search(
    query="mountain landscape",
    limit=3,
    sources=["images"]
)

print("Basic image search results:")
if hasattr(basic_images, 'images') and basic_images.images:
    print(f"Found {len(basic_images.images)} images")
    for i, img in enumerate(basic_images.images, 1):
        print(f"{i}. {img.title}")
        print(f"   Source page: {img.url}")
        print(f"   Position: {img.position}")

Basic image search results:
Found 3 images
1. Free Sunset Mountain Landscape Image | Download at StockCake
   Source page: https://stockcake.com/i/sunset-mountain-landscape_1219985_159231
   Position: 1

2. Watercolor Mountain Landscape Tutorial: Step-by-Step Guide
   Source page: https://www.esperoart.com/mountain-landscape-watercolor-tutorial/
   Position: 2

3. Mountain Landscape with Forests and a Stream · Free Stock Photo
   Source page: https://www.pexels.com/photo/mountain-landscape-with-forests-and-a-stream-18682254/
   Position: 3

This gives you relevant visual content. But many applications need specific image dimensions.

HD and 4K image search with size operators

Computer vision projects require consistent input dimensions. Machine learning models trained on specific image sizes need datasets with matching resolutions. Firecrawl’s image search supports Google’s established image size operators, letting you filter results by exact dimensions or minimum sizes.

These operators work within your search query string to target specific resolutions. Instead of getting random image sizes that require post-processing, you get images that match your requirements from the start.

Here’s an example:

def search_images_by_resolution():
    # Search for Full HD images
    hd_images = app.search(
        query="sunset landscape imagesize:1920x1080",
        limit=3,
        sources=["images"]
    )

    print("HD image search results:")
    if hasattr(hd_images, 'images') and hd_images.images:
        for i, img in enumerate(hd_images.images, 1):
            print(f"{i}. {img.title}")
            print(f"   Source: {img.url}")

    # Search for 4K images
    uhd_results = app.search(
        query="nature wallpaper imagesize:3840x2160",
        limit=2,
        sources=["images"]
    )

    print("\n4K image search results:")
    if hasattr(uhd_results, 'images') and uhd_results.images:
        for img in uhd_results.images:
            print(f"- {img.title}")
            print(f"  Source: {img.url}")

search_images_by_resolution()

HD image search results:
1. Stunning sunset landscape: Mediterranean rocks with olive trees in the sea.  Photorealistic motion background. 3D rendering.
   Source: https://www.storyblocks.com/video/stock/photorealistic-motion-background-mediteranean-rocks-in-the-sea-with-olive-trees-growing-on-them-beautiful-sunset-landscape-scene-3d-rendering-bymitreh8kaz2db71

2. Sunset in a hilly rural landscape on the edge of a forest, Timelapse video
   Source: https://www.storyblocks.com/video/stock/sunset-in-a-hilly-rural-landscape-on-the-edge-of-a-forest-timelapse-video-351768226

4K image search results:
- 50 Nature Wallpapers (All 4K, No watermarks) - Album on Imgur
  Source: https://in.pinterest.com/pin/1002543567023869780/
- My top 10 of all time - Nature and Art : r/wallpapers
  Source: https://www.reddit.com/r/wallpapers/comments/1g1cymc/my_top_10_of_all_time_nature_and_art/

Common resolution operators for different use cases:

imagesize:1920x1080 - Full HD for standard displays
imagesize:2560x1440 - QHD for high-resolution screens
imagesize:3840x2160 - 4K UHD for professional applications
larger:1920x1080 - HD and above for flexible sizing

Advanced Filtering & Strategy Selection

Basic search gets you started, but real world applications need precise targeting. You might need recent news instead of evergreen content, or results from multiple specialized sources combined.

Time-based filtering for current versus evergreen content

Web search results span decades of content, but applications often need recent information versus historical data. The tbs parameter filters results by publication date using Google’s time-based search codes. These abbreviated codes like qdr:d (past day), qdr:w (past week), and qdr:m (past month) let you separate trending topics from foundational concepts. Market research needs last month’s data, while educational content remains valuable regardless of age.

Here’s how the tbs parameter filters results by recency:

def compare_content_freshness(base_query):
    recent_results = app.search(
        query=f"{base_query} trends",
        limit=3,
        tbs="qdr:m"  # Past month
    )

    all_time_results = app.search(
        query=f"{base_query} fundamentals",
        limit=3
    )

    print("Recent results (past month):")
    for result in recent_results.web:
        print(f"- {result.title}")

    print("\nAll-time results (fundamentals):")
    for result in all_time_results.web:
        print(f"- {result.title}")

compare_content_freshness("machine learning")

Recent results (past month):
- Top 13 Machine Learning Trends CTOs Need to Know in 2025
- The Evolution of AI and ML: Trends, Impact, and Future Insights
- The Ultimate List of Machine Learning Statistics for 2025 - Itransition

All-time results (fundamentals):
- Introduction to Machine Learning Concepts - Training - Microsoft Learn
- Machine Learning Crash Course - Google for Developers
- Machine Learning Tutorial - GeeksforGeeks

Time filtering with qdr:m returns current trend analysis. Searches without time filters return educational resources and established documentation.

Combining multiple categories for complete coverage

Complex applications often need content from multiple specialized sources. Research tools might combine academic papers with code implementations. You can search across different specialized sources at once:

multi_category_results = app.search(
    query="neural networks",
    limit=4,
    categories=["github", "research"]
)

print("Combined GitHub + Research results:")
for result in multi_category_results.web:
    print(f"- {result.title}")
    print(f"  Category: {getattr(result, 'category', 'N/A')}")
    print(f"  URL: {result.url}")

Combined GitHub + Research results:
- Neural Networks | Journal | ScienceDirect.com by Elsevier
  Category: research
  URL: https://www.sciencedirect.com/journal/neural-networks
- Neural Network - an overview | ScienceDirect Topics
  Category: research
  URL: https://www.sciencedirect.com/topics/social-sciences/neural-network
- Neural Network - an overview
  Category: research
  URL: https://www.sciencedirect.com/topics/computer-science/neural-network
- Supervised learning in DNA neural networks
  Category: research
  URL: https://www.nature.com/articles/s41586-025-09479-w

Choosing the right search strategy

Match your search parameters to your content requirements:

Categories: Use github when you need code implementations, research for academic backing
Sources: Use news for market trends, web for comprehensive guides, images for visual content
Time filters: Apply qdr:d (day), qdr:w (week), qdr:m (month) when current information matters
Image operators: Use size filters when your models require specific input dimensions
Multiple categories: Combine approaches when you need diverse source types

Building an AI Research Agent: LangGraph + Firecrawl Integration

Most search applications follow the same pattern: user asks question, system returns links, user manually evaluates results. This works for simple queries, but breaks down when you need comprehensive research across different content types. Should you search GitHub for code examples? Academic databases for research papers? News sources for recent developments?

Why agents beat simple scripts for search applications

Consider researching “machine learning for healthcare.” A simple script might search Google and return general articles. But comprehensive research needs:

Academic papers from PubMed or arXiv for scientific backing
GitHub repositories for implementation examples
Recent news for current developments and regulations
Web articles for tutorials and practical guides

An intelligent agent can automatically choose the right search strategy based on query intent, combine results from multiple sources, and handle errors gracefully. Instead of writing separate functions for each search type, you build one system that reasons about what information the user actually needs, so it’s great as a search API for AI agents and RAG applications.

LangGraph is an AI framework that augments web search APIs into intelligent, reasoning-capable systems. Unlike simple scripts that always follow the same steps, LangGraph agents analyze user queries and automatically choose appropriate search strategies. The ReAct (Reasoning and Acting) framework enables agents to think through problems, decide which search tools are relevant, execute those searches, and synthesize results into coherent responses.

What we’re building: a multi-tool research agent

Our research agent uses four specialized search tools, each targeting different aspects of Firecrawl’s search endpoint. LangGraph provides the reasoning layer that decides which tools to use based on user intent. The agent can:

Automatically pick between web, academic, code, and news searches
Combine multiple search types for comprehensive research
Handle API errors without breaking the user experience
Remember context across different questions in a conversation

This demonstrates how web search APIs for AI applications can power sophisticated, context-aware systems that adapt to user needs.

When NOT to use this approach

Agents add complexity and cost. Use simple search functions instead when:

You always need the same search type (like only GitHub repositories)
Response time is more important than search quality
You’re building a prototype or MVP
Your use case doesn’t require intelligent tool selection

For complex research workflows, competitive intelligence, or AI training data collection, the agent approach provides significant value.

Setting up the development environment

Before building the agent, you need API keys and dependencies. This agent requires both OpenAI (for reasoning) and Firecrawl (for searching).

Get your API keys:

Add both keys to your environment file:

echo "FIRECRAWL_API_KEY='fc-YOUR-FIRECRAWL-KEY'" >> .env
echo "OPENAI_API_KEY='sk-YOUR-OPENAI-KEY'" >> .env

You can install dependencies using uv (recommended for faster, more reliable package management):

# Install uv if you don't have it
curl -LsSf https://astral.sh/uv/install.sh | sh

source ~/.bashrc  # or ~/.zshrc

# uv expects to work in a Python project directory with a pyproject.toml file
uv init

# Install project dependencies
uv add firecrawl-py python-dotenv langchain-openai langgraph

If you prefer pip:

pip install firecrawl-py python-dotenv langchain-openai langgraph

Creating the specialized search tools

Start by building four tools that target different aspects of Firecrawl’s search endpoint. Each tool handles a specific content type but follows the same pattern:

#!/usr/bin/env python3
"""Research Assistant Agent using LangGraph and Firecrawl Search API"""

import os
from dotenv import load_dotenv
from firecrawl import Firecrawl
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent
from langgraph.checkpoint.memory import MemorySaver

# Load environment variables
load_dotenv()
firecrawl = Firecrawl()

def general_web_search(query: str, limit: int = 3) -> str:
    """Search the web for general information and extract full content."""
    try:
        results = firecrawl.search(
            query=query,
            limit=limit,
            sources=["web"],
            scrape_options={"formats": ["markdown"]}
        )
        
        if not results.web:
            return f"No web results found for: {query}"
        
        formatted_results = f"Web Search Results for '{query}':\n\n"
        for i, result in enumerate(results.web, 1):
            title = result.metadata.title if hasattr(result, 'metadata') else result.title
            url = result.metadata.url if hasattr(result, 'metadata') else result.url
            content_preview = result.markdown[:300] + "..." if len(result.markdown) > 300 else result.markdown
            
            formatted_results += f"{i}. {title}\n"
            formatted_results += f"   URL: {url}\n"
            formatted_results += f"   Content: {content_preview}\n\n"
        
        return formatted_results
        
    except Exception as e:
        return f"Error in web search: {str(e)}"

With Firecrawl as the search API, you get real-time search API capabilities combined with content extraction. The scrape_options parameter extracts complete page content, not just titles and descriptions. You get full articles ready for processing.

Add the specialized category searches:

def academic_research_search(topic: str, limit: int = 3) -> str:
    """Search for academic research papers and scholarly content."""
    try:
        results = firecrawl.search(
            query=topic,
            limit=limit,
            categories=["research"],
            scrape_options={"formats": ["markdown"]}
        )
        
        if not results.web:
            return f"No research papers found for: {topic}"
        
        formatted_results = f"Academic Research Results for '{topic}':\n\n"
        for i, result in enumerate(results.web, 1):
            title = result.metadata.title if hasattr(result, 'metadata') else result.title
            url = result.metadata.url if hasattr(result, 'metadata') else result.url
            category = getattr(result, 'category', 'research')
            
            formatted_results += f"{i}. {title}\n"
            formatted_results += f"   Category: {category}\n"
            formatted_results += f"   URL: {url}\n\n"
        
        return formatted_results
        
    except Exception as e:
        return f"Error in research search: {str(e)}"

def code_search(technology: str, limit: int = 3) -> str:
    """Search GitHub repositories for code examples and projects."""
    try:
        results = firecrawl.search(
            query=technology,
            limit=limit,
            categories=["github"]
        )
        
        if not results.web:
            return f"No GitHub repositories found for: {technology}"
        
        formatted_results = f"GitHub Code Search Results for '{technology}':\n\n"
        for i, result in enumerate(results.web, 1):
            formatted_results += f"{i}. {result.title}\n"
            formatted_results += f"   URL: {result.url}\n"
            formatted_results += f"   Description: {result.description or 'No description available'}\n\n"
        
        return formatted_results
        
    except Exception as e:
        return f"Error in code search: {str(e)}"

def news_search(topic: str, time_filter: str = "qdr:w", limit: int = 3) -> str:
    """Search for recent news articles on a topic."""
    try:
        results = firecrawl.search(
            query=topic,
            limit=limit,
            sources=["news"],
            tbs=time_filter
        )
        
        news_results = []
        if hasattr(results, 'news') and results.news:
            news_results = results.news
        elif hasattr(results, 'web') and results.web:
            news_results = results.web
        
        if not news_results:
            return f"No recent news found for: {topic}"
        
        time_desc = {"qdr:d": "past day", "qdr:w": "past week", "qdr:m": "past month"}.get(time_filter, "recent")
        formatted_results = f"News Search Results for '{topic}' ({time_desc}):\n\n"
        
        for i, result in enumerate(news_results, 1):
            formatted_results += f"{i}. {result.title}\n"
            formatted_results += f"   URL: {result.url}\n"
            formatted_results += f"   Date: {getattr(result, 'date', 'Date not available')}\n\n"
        
        return formatted_results
        
    except Exception as e:
        return f"Error in news search: {str(e)}"

These four tools create a complete search ecosystem. Each targets different content types, but the real power comes from intelligent selection. The agent analyzes user intent and picks the right tools automatically.

Building the LangGraph reasoning layer

Now connect these tools with LangGraph’s reasoning framework. LangGraph handles the complex decision-making about which tools to use and when. For more details on LangGraph patterns and advanced agent architectures, see our comprehensive LangGraph tutorial.

def create_research_agent():
    """Create the research assistant agent with search tools."""
    
    model = ChatOpenAI(model="gpt-5", temperature=0)
    tools = [general_web_search, academic_research_search, code_search, news_search]
    
    agent = create_react_agent(
        model=model,
        tools=tools,
        checkpointer=MemorySaver()
    )
    
    return agent

The agent receives a system prompt that explains when to use each tool. LangGraph’s ReAct framework handles the reasoning process.

def main():
    """Main function to run the research agent."""
    print("🔍 Research Assistant Agent")
    print("=" * 50)
    print("I can help you search for:")
    print("• General web information with full content")
    print("• Academic research papers")
    print("• GitHub code repositories") 
    print("• Recent news articles")
    print("\nType 'quit' to exit")
    print("=" * 50)
    
    agent = create_research_agent()
    
    while True:
        user_input = input("\n💬 What would you like to research? ")
        
        if user_input.lower() in ['quit', 'exit', 'q']:
            print("👋 Goodbye!")
            break
            
        if not user_input.strip():
            continue
            
        try:
            print("\n🤖 Searching...")
            
            system_message = """You are a Research Assistant Agent that helps users find information on any topic.

You have access to four specialized search tools:
1. general_web_search - For web search with full content extraction
2. academic_research_search - For finding academic papers and scholarly content  
3. code_search - For finding GitHub repositories and code examples
4. news_search - For finding recent news articles

Choose the appropriate tool(s) based on the user's request. You can use multiple tools for extensive research."""
          
            response = agent.invoke(
                {"messages": [("system", system_message), ("user", user_input)]},
                config={"configurable": {"thread_id": "research_session"}}
            )
            
            final_message = response["messages"][-1]
            print(f"\n📋 Research Results:\n{final_message.content}")
            
        except Exception as e:
            print(f"\n❌ Error: {str(e)}")

if __name__ == "__main__":
    main()

The agent receives a system prompt that explains when to use each tool. LangGraph’s ReAct framework handles the reasoning - the agent analyzes the user’s query, decides which tools are relevant, calls them in the right order, and synthesizes the results.

Running and testing the agent

Save this code as research_agent.py and run it:

python research_agent.py

🔍 Research Assistant Agent
==================================================
I can help you search for:
• General web information with full content
• Academic research papers
• GitHub code repositories
• Recent news articles

Type 'quit' to exit
==================================================

💬 What would you like to research?

Intelligent tool selection in action

Here’s how the agent automatically chooses search strategies based on user intent. Note when you run these, they may take a moment to return results.

Query: “Find recent machine learning frameworks for computer vision”

The agent recognizes this needs code repositories and chooses the code_search tool:

🤖 Searching...

📋 Research Results:
Here are some recent machine learning frameworks for computer vision found on GitHub:

1. OpenCV: An open-source library providing computer vision and machine learning 
   algorithms for image and video analysis. It also includes Detectron2, a 
   high-performance object detection framework.

2. 500 AI Machine learning Deep learning Computer vision NLP Projects with code: 
   This repository contains a list of 500+ AI projects, including machine learning 
   and computer vision, with code.

3. Lightly: A computer vision framework for self-supervised learning. The repository 
   also includes Learnergy, energy-based machine learning models built upon PyTorch.

Query: “I need academic research papers about neural networks”

The agent switches to academic_research_search for scholarly sources:

🤖 Searching...

📋 Research Results:
Here are some academic research papers on neural networks:

1. Neural Networks Journal | ScienceDirect: The official journal for neural 
   network research with peer-reviewed articles.

2. Neural Network - an overview | ScienceDirect Topics: Comprehensive overview 
   of neural networks in the context of social sciences.

3. Overview of artificial neural networks - PubMed: Medical and biological 
   applications of artificial neural networks.

Query: “Show me Python web scraping libraries on GitHub”

The agent uses code_search again, demonstrating ai scraping resource discovery:

🤖 Searching...

📋 Research Results:
Here are some Python web scraping libraries and projects on GitHub:

1. Web Scraping With Python: Tutorial using requests and Beautiful Soup libraries.

2. Python-Web-Scraping-Tutorial by oxylabs: Comprehensive tutorial from simple 
   examples to complex tasks.

3. Scrapling by D4Vinci: Undetectable, powerful, flexible Python library for 
   web scraping.

4. Scrapegraph-ai: Web scraping python library that uses LLM and direct graph 
   logic to create scraping pipelines.

Notice how the agent chooses different tools automatically. You don’t need manual configuration because it understands query intent and picks the right search strategy.

Production improvements for scaling

This research agent demonstrates core AI search functionality, but production applications have additional considerations:

Error handling and resilience: Implement retry logic with exponential backoff for API failures. Add fallback strategies when specific search types fail. Log errors for monitoring and debugging. Set up health checks to detect service issues early.

Performance and cost optimization: Cache frequently requested searches to reduce API calls. Implement rate limiting to prevent quota exhaustion. Monitor credit usage and set daily budgets. Use result pagination for large datasets instead of increasing limits.

User experience improvements: Add streaming responses so users see results as they come in:

async def stream_agent_response(agent, query):
    """Stream agent responses for real-time feedback."""
    async for chunk in agent.astream(
        {"messages": [("user", query)]},
        config={"configurable": {"thread_id": "user_session"}}
    ):
        if "messages" in chunk:
            yield chunk["messages"][-1].content

Context and memory management: Store conversation history in persistent storage. Implement session management for multiple users. Add context awareness so agents remember previous searches and can build on them.

Integration patterns: Connect with vector databases for semantic search capabilities. Add webhooks for real-time data updates. Integrate with workflow systems for automated research pipelines. Build APIs so other applications can use your agent.

Monitoring and analytics: Track which search types get used most. Monitor response times and success rates. Analyze user query patterns to improve tool selection. Set up alerts for system health and performance issues.

Security and compliance: Validate and sanitize user inputs. Implement authentication and authorization. Add audit logs for compliance requirements. Set up data retention policies for search results.

This agent architecture shows how web search API for AI applications can power sophisticated, context-aware systems. By combining Firecrawl’s search capabilities with LangGraph’s reasoning framework, you get searching for agents that automatically choose optimal search strategies and provide thorough research results without manual intervention.

Searching and Beyond

Most web search APIs give you links when you need content. You search, get URLs, then scrape each page separately. That’s two API calls, double the complexity, and higher costs. Firecrawl’s search endpoint skips this workflow by extracting content during the search itself. One call gets you clean, processed text ready for your application.

This tutorial covered search capabilities from basic queries to advanced filtering with categories, sources, and time-based parameters. We built a research agent that automatically chooses between web, academic, code, and news searches.

With Firecrawl, you can combine search for discovery, scrape for precision, and crawl for complete site coverage to build AI-native web search applications. Get started with a free Firecrawl account and 500 credits.

To learn more about other Firecrawl endpoints, check out our scrape endpoint tutorial and crawl endpoint guide.

Frequently Asked Questions

What’s the difference between SERP APIs and Search APIs?

SERP APIs specifically scrape and reformat data from existing search engines like Google or Bing. They act as middleman services that parse search engine results pages and return structured JSON data. Examples include SerpAPI, ScrapingDog, and Serper.

Search APIs are the broader category that includes both SERP scrapers and independent search engines, like Firecrawl’s search endpoint. This encompasses SERP APIs plus platforms that built their own search indices and algorithms.

All SERP APIs are Search APIs, but not all Search APIs are SERP APIs.

How do AI-native search APIs work differently from traditional search?

Traditional search APIs match keywords and return results based on text similarity and popularity signals. AI-native search APIs like Firecrawl understand query intent and can extract full content from results, not just metadata. They can also filter by specialized categories like GitHub repositories or academic papers, and format results as clean markdown ready for AI processing rather than HTML meant for human browsing.

How does content extraction work with search results?

When you add scrape_options to a search request, Firecrawl automatically extracts full page content from each result. This includes converting HTML to clean markdown, extracting all links, and handling JavaScript-rendered content. You get complete articles instead of just titles and snippets.

How much does Firecrawl’s search endpoint cost?

When search results are not scraped, the cost is 2 credits per 10 search results. When scraping is enabled, there is no additional charge for basic scrapes of each search result beyond the standard scraping costs. Additional features like PDF parsing (+1 credit per page), stealth proxy mode (+4 credits), and structured JSON extraction (+5 credits) incur extra costs. Plans start at $16/month for 1,000 credits.

What are the rate limits for Firecrawl search API requests?

Search rate limits vary by plan:

Free: 5 requests/minute
Hobby: 50 requests/minute
Standard: 250 requests/minute
Growth: 2,500 requests/minute

These limits apply per minute and reset automatically. If you exceed your limit, requests will return a rate limit error until the next minute window.

Bex Tuychiev @bextuychiev

Technical Writer at Firecrawl

About the Author

Bex Tuychiev is a Technical Writer at Firecrawl and a Kaggle Master with over 15k followers. He loves writing detailed guides, tutorials, and notebooks on complex data science and machine learning topics