Introducing Spark 1 Pro and Spark 1 Mini models in /agent. Try it now →

What is the best way to crawl documentation sites at scale?

TL;DR

Firecrawl's crawl endpoint recursively extracts entire documentation sites in one call. It handles JavaScript-rendered doc frameworks, preserves code blocks and formatting, and returns clean markdown ready for knowledge bases, RAG systems, or developer tools.

Recursive crawling

Point Firecrawl at a docs root and it discovers all pages automatically:

result = app.crawl("https://docs.example.com", {
    "limit": 500,
    "maxDiscoveryDepth": 10,
    "scrapeOptions": {
        "formats": ["markdown"],
        "onlyMainContent": True,
    },
})

Why Firecrawl for docs

ChallengeHow Firecrawl handles it
JavaScript frameworksAutomatic rendering (Docusaurus, GitBook, etc.)
Code blocksPreserved with language tags
Deep nestingConfigurable depth, follows all nav links
Version filteringPath include/exclude patterns

Filtering by path

Control scope with path filters:

result = app.crawl("https://docs.example.com", {
    "includePaths": ["/api/*", "/guides/*"],
    "excludePaths": ["/changelog/*", "/blog/*"],
})

Key Takeaways

Firecrawl handles documentation crawling at scale—recursive discovery, JavaScript rendering, and code block preservation in one API. Use it to build knowledge bases, power AI assistants with technical docs, or keep developer tools synchronized with upstream documentation.

FOOTER
The easiest way to extract
data from the web
. . .. ..+ .:. .. .. .:: +.. ..: :. .:..::. .. .. .--:::. .. ... .:. .. .. .:+=-::.:. . ...-.::. .. ::.... .:--+::..: ......:+....:. :.. .. ....... ::-=:::: ..:-:-...: .--..:: ......... .. . . . ..::-:-.. .-+-:::.. ...::::. .: ...::.:.. . -... ....: . . .--=+-::. :-=-:.... . .:..:: .:---:::::-::.... ..::........::=..... ...:-.. .:-=--+=-:. ..--:..=::.... . .:.. ..:---::::---=:::..:... ..........::::.:::::::-::.-.. ...::--==:. ..-::-+==-:... .-::....... ..--:. ..:=+==.---=-+-:::::::-.. . .....::......:: ::::-::.---=+-:..::-+==++X=-:. ..:-::-=-== ---.. .:.--::.. .:-==::=--X==-----====--::+:::+... ..-....-:..::-::=-=-:-::--===++=-==-----== X+=-:.::-==----+==+XX+=-::.:+--==--::. .:-+X=----+X=-=------===--::-:...:. .... ....::::...:-:-==+++=++==+++XX++==++--+-+==++++=-===+=---:-==+X:XXX+=-:-=-==++=-:. .:-=+=- -=X+X+===+---==--==--:..::...+....+ ..:::---.::.---=+==XXXXXXXX+XX++==++===--+===:+X+====+=--::--=+XXXXXXX+==++==+XX+=: ::::--=+++X++X+XXXX+=----==++.+=--::+::::+. ::.=... .:::-==-------=X+++XXXXXXXXXXX++==++.==-==-:-==+X++==+=-=--=++++X++:X:X+++X+-+X X+=---=-==+=+++XXXXX+XX=+=--=X++XXX==---::-+-::::.:..-..
Backed by
Y Combinator
LinkedinGithubYouTube
SOC II · Type 2
AICPA
SOC 2
X (Twitter)
Discord