What is crawl depth limit?
Crawl depth limit caps how far a crawler will venture from its starting URL before it stops following new links. Most developers think of depth as the number of link hops from the seed URL (the homepage is depth 0, pages linked from it are depth 1, and so on). Some crawlers, however, measure depth by discovery order rather than link distance: a page discovered directly from a sitemap counts as depth 0 even if it is five clicks deep in the site's navigation.
| Depth model | How depth is counted | Implication |
|---|---|---|
| Link-hop depth | Clicks from seed URL | Deeply nested pages need a high limit |
| Discovery depth | Order in which URLs are found | Sitemapped pages are depth 0 regardless of nesting |
| Path depth | Segments in the URL path | Does not reflect link structure |
| No limit | Follows all reachable links | Can run indefinitely on large sites |
Setting a depth limit prevents a crawl from running indefinitely on large or deeply nested sites, at the cost of missing content beyond that boundary. A limit of 2 using link-hop depth captures the homepage and two levels of navigation, which covers most documentation sites; the same limit using discovery depth would still include sitemapped pages regardless of where they sit in the URL hierarchy. The right value depends on where the content you need actually lives. Too shallow and you miss deep but valuable pages; too deep and the crawl ballooms in size and cost.
In Firecrawl's Crawl API, the maxDiscoveryDepth parameter uses discovery order: the starting URL and any pages found in the site's sitemap have a discovery depth of 0, and each subsequent layer of links increments the depth by 1. This means maxDiscoveryDepth: 1 will still return sitemapped pages even if they are nested several directories deep, because they were discovered directly rather than through link traversal.
data from the web