Firecrawl CLI gives agents the complete web data toolkit for scraping, searching, and browsing. Try it now →

What is crawl scope?

Crawl scope defines which URLs a crawler is permitted to fetch through a set of boundary rules: allowed domains, path include patterns, path exclude patterns, and maximum depth. Without explicit scope rules, a crawler following external links drifts outside the target site and never returns, or crawls admin pages, session URLs, and pagination variants that inflate crawl budget without adding useful content.

Scope ruleWhat it controlsExample
Allowed domainsRestricts crawl to specific hostnamesdocs.example.com only
Include pathsCrawl only URLs matching a pattern/blog/*
Exclude pathsSkip URLs matching a pattern/admin/*, /search*
Max depthLimits how far from the root URL the crawler venturesmaxDiscoveryDepth: 3
File type filtersSkip non-target content typesExclude .pdf, .zip

Set scope before starting a crawl, not after. Crawling the full site and filtering results post-hoc wastes requests and load on the target server. Include patterns are the most precise control: crawling a documentation site with /docs/* as the only include path ensures the crawler never touches marketing pages, blog posts, or changelogs that would dilute a technical dataset. Exclude patterns handle the inverse, blocking known noise like pagination variants (?page=*), session tokens in URLs, and auto-generated search results. Scope rules and robots.txt address complementary concerns: robots.txt reflects what the site owner allows; scope rules reflect what you actually need.

Firecrawl's Crawl API accepts path filters and domain constraints directly as crawl parameters, so you can define scope on any job without building custom URL filtering or post-processing logic.

Last updated: Mar 11, 2026
FOOTER
The easiest way to extract
data from the web
Backed by
Y Combinator
LinkedinGithubYouTube
SOC II · Type 2
AICPA
SOC 2
X (Twitter)
Discord