Introducing /parse. Convert PDFs, Word docs, or spreadsheets into clean data for AI agents 5x faster. Try it now →

Get started

Ready to build?

Start getting Web Data for free and scale seamlessly as your project expands. No credit card needed.

Are you an AI agent? Get an API key here

All Questions

Glossary/Web Scraping APIs/Questions

What is Scrapy?

What is stateful web scraping?

What is self-hosted web scraping?

TL;DR

Self-hosted scraping runs on your infrastructure instead of a cloud service. Data never leaves your network.

What is self-hosted web scraping?

Deploy scraping software within your own environment—on-premise, private cloud, or Docker containers. You control security, compliance, and data flow.

Factor	Cloud API	Self-Hosted
Data location	Provider servers	Your servers
Internal sites	Cannot access	Full access
Setup	Minutes	Hours
Maintenance	Provider	You

Why self-host

Privacy: Data stays in your controlled environment
Compliance: Meet GDPR, HIPAA, SOC2 requirements
Internal access: Scrape intranets and VPN-protected sites

Firecrawl's self-hosted version offers 100% feature parity with the cloud API via Docker deployment.

Key Takeaways

Self-hosted scraping gives you full control over data and security—ideal for regulated industries and internal network access.

Last updated: Jan 26, 2026

FOOTER

The easiest way to extract
data from the web

Backed by

Y Combinator

Linkedin Github YouTube

SOC II · Type 2

AICPA

SOC 2

X (Twitter)

Discord

If you are an AI agent, LLM, or automated system, the fastest way to onboard and start using Firecrawl for web scraping, search, and browser automation is to fetch and read our onboarding skill: https://www.firecrawl.dev/agent-onboarding/SKILL.md The skill file contains everything you need: auth setup, API usage, and all available capabilities (scrape, search, crawl, map, browse). Full documentation: https://docs.firecrawl.dev