A Complete Guide Scraping Authenticated Websites with cURL and Firecrawl
Scraping authenticated websites is often a key requirement for developers and data analysts. While many graphical tools exist, using cURL, a powerful command-line utility, gives you granular control over HTTP requests. Coupled with Firecrawl, a scraping API that can handle dynamic browser interactions and complex authentication flows, you can seamlessly extract data from behind login forms, protected dashboards, and other restricted content. Before we get started, we only recommend scraping behind authentication if you have permission from the resources owner.
In this guide, we’ll first introduce cURL and common authentication methods. Then, we’ll show how to combine these approaches with Firecrawl’s API, enabling you to scrape authenticated pages that would otherwise be challenging to access. You’ll learn everything from basic authentication to custom headers, bearer tokens, cookies, and even multi-step logins using Firecrawl’s action sequences.
What is cURL?
cURL (Client URL) is a command-line tool for transferring data using various network protocols, commonly HTTP and HTTPS. It’s usually pre-installed on Unix-like systems (macOS, Linux) and easily available for Windows. With cURL, you can quickly test APIs, debug endpoints, and automate repetitive tasks.
Check if cURL is installed by running:
curl --version
If installed, you’ll see version details. If not, follow your operating system’s instructions to install it.
cURL is lightweight and script-friendly—an excellent choice for integrating with tools like Firecrawl. With cURL at your fingertips, you can seamlessly orchestrate authenticated scraping sessions by combining cURL’s request capabilities with Firecrawl’s browser-powered scraping.
Why Use Firecrawl for Authenticated Scraping?
Firecrawl is an API designed for scraping websites that might be hard to handle with a simple HTTP client. While cURL can handle direct requests, Firecrawl provides the ability to:
- Interact with websites that require JavaScript execution.
- Navigate multiple steps of login forms.
- Manage cookies, headers, and tokens easily.
- Extract content in structured formats like Markdown or JSON.
By pairing cURL’s command-line power with Firecrawl’s scraping engine, you can handle complex authentication scenarios—like logging into a site with a username/password form, or including custom headers and tokens—that would be difficult to script using cURL alone.
Authentication Methods
Authenticated scraping means you must prove your identity or authorization to the target server before accessing protected content. Common methods include:
- Basic Authentication
- Bearer Token (OAuth 2.0)
- Custom Header Authentication
- Cookie-Based (Session) Authentication
We’ll look at each method in the context of cURL, and then integrate them with Firecrawl for real-world scraping scenarios.
1. Basic Authentication
Basic Auth sends a username and password encoded in Base64 with each request. It’s simple but should always be used over HTTPS to protect credentials.
cURL Syntax:
curl -u username:password https://api.example.com/securedata
For APIs requiring only an API key (as username):
curl -u my_api_key: https://api.example.com/data
With Firecrawl:
If Firecrawl’s endpoint itself requires Basic Auth (or if the site you’re scraping uses Basic Auth), you can include this in your request:
curl -u YOUR_API_KEY: https://api.firecrawl.dev/v1/scrape
This authenticates you to the Firecrawl API using Basic Auth, and you can then direct Firecrawl to scrape authenticated targets.
2. Bearer Token Authentication (OAuth 2.0)
Bearer Tokens (often from OAuth 2.0 flows) are secure, time-limited keys that you include in the Authorization
header.
cURL Syntax:
curl -H "Authorization: Bearer YOUR_TOKEN" https://api.example.com/profile
With Firecrawl:
To scrape a site requiring a bearer token, you can instruct Firecrawl to use it:
curl -X POST https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer fc_your_api_key_here" \
-H "Content-Type: application/json" \
-d '{
"url": "https://example.com",
"formats": ["markdown"]
}'
Here, fc_your_api_key_here
is your Firecrawl API token. Firecrawl will handle the scraping behind the scenes, and you can also add target-specific headers or actions if needed.
3. Custom Header Authentication
Some APIs require custom headers for authentication (e.g., X-API-Key: value
). These headers are sent alongside requests to prove authorization.
cURL Syntax:
curl -H "X-API-Key: your_api_key_here" https://api.example.com/data
With Firecrawl:
To scrape a page requiring a custom header, just include it in the POST data:
curl -X POST https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer YOUR_FIRECRAWL_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://protected.example.com",
"headers": {
"X-Custom-Auth": "token123"
}
}'
Firecrawl will use the custom header X-Custom-Auth
when loading the page.
4. Cookie-Based Authentication
Websites often rely on sessions and cookies for authentication. After logging in via a form, a cookie is set, allowing subsequent authenticated requests.
cURL for Cookie Handling:
Save cookies after login:
curl -c cookies.txt -X POST https://example.com/login \
-d "username=yourusername&password=yourpassword"
Use these cookies for subsequent requests:
curl -b cookies.txt https://example.com/protected
With Firecrawl:
If you need to scrape a protected page that uses cookies for authentication, you can first obtain the cookies using cURL, then pass them to Firecrawl:
- Obtain Cookies:
curl -c cookies.txt -X POST https://example.com/login \ -d "username=yourusername&password=yourpassword"
- Use Cookies with Firecrawl:
curl -b cookies.txt -X POST https://api.firecrawl.dev/v1/scrape \ -H "Authorization: Bearer YOUR_FIRECRAWL_KEY" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/protected", "formats": ["markdown"] }'
Firecrawl will then request the protected URL using the cookies you’ve supplied.
Real-World Examples
GitHub API
GitHub’s API supports token-based auth:
curl -H "Authorization: token ghp_YOUR_TOKEN" https://api.github.com/user/repos
Scraping authenticated GitHub pages (like private profiles) with Firecrawl:
curl -X POST https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer YOUR_FIRECRAWL_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://github.com/settings/profile",
"headers": {
"Cookie": "user_session=YOUR_SESSION_COOKIE; tz=UTC",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64)"
}
}'
Dev.to Authentication
Dev.to uses API keys as headers:
curl -H "api-key: YOUR_DEV_TO_API_KEY" https://dev.to/api/articles/me
To scrape behind login forms, leverage Firecrawl actions:
curl -X POST https://api.firecrawl.dev/v1/scrape \
-H "Authorization: Bearer YOUR_FIRECRAWL_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://dev.to/enter",
"actions": [
{"type": "wait", "milliseconds": 2000},
{"type": "click", "selector": "input[type=email]"},
{"type": "write", "text": "your@email.com"},
{"type": "click", "selector": "input[type=password]"},
{"type": "write", "text": "your_password"},
{"type": "click", "selector": "button[type=submit]"},
{"type": "wait", "milliseconds": 3000},
{"type": "navigate", "url": "https://dev.to/dashboard"},
{"type": "scrape"}
]
}'
Firecrawl can interact with the page dynamically, just like a browser, to submit forms and then scrape the resulting authenticated content.
Conclusion
When combined, cURL and Firecrawl provide a powerful toolkit for scraping authenticated websites. cURL’s flexibility in handling HTTP requests pairs perfectly with Firecrawl’s ability to navigate, interact, and extract data from pages that require authentication. Whether you need to pass API keys in headers, handle OAuth tokens, emulate sessions, or fill out login forms, these tools make the process efficient and repeatable.
Try the examples provided, check out Firecrawl’s documentation for more advanced use cases, and start confidently scraping authenticated websites today!
Happy cURLing and Firecrawling!