Introducing Authenticated Scraping

January 20, 2025

•

Eric Ciarla imageEric Ciarla

Introducing /extract: Get structured web data with just a prompt

The era of writing web scrapers is over

Today we’re releasing /extract - write a prompt, get structured data from any website. No scrapers. No pipelines. Just results.

Getting web data is hard

If you’ve ever needed structured data from websites—whether to enrich your CRM, monitor competitors, or power various applications—you’re probably familiar with the frustrating options available today:

  • Manually researching and copy-pasting data from multiple sources, consuming countless hours
  • Writing and maintaining fragile web scrapers that break at the slightest site change
  • Using scraping services and building complex LLM pipelines with limited context windows that force you to break down data manually

Fortunately, with our /extract endpoint, you can leave these cumbersome approaches in the past and focus on what matters - getting the data you need.

What You Can Build With /extract

Companies are already using /extract to:

  • Enrich thousands of CRM leads with company data
  • Automate KYB processes with structured business information
  • Track competitor prices and feature changes in real-time
  • Build targeted prospecting lists at scale

Here’s how it works:

  1. Write a prompt describing the data you need
  2. Point us at any website (use wildcards like example.com/*)
  3. Get back clean, structured JSON

No more broken scrapers. No more complex pipelines. Just the data you need to build.

Current Limitations

While /extract handles most web data needs effectively, there are some edge cases we’re actively improving:

  1. Scale Limitations: Very large sites (think Amazon’s entire catalog) require breaking requests into smaller chunks
  2. Advanced Filtering: Complex queries like time-based filtering are still in development
  3. Consistency: Multiple runs may return slightly different results as we refine our extraction model

We’re actively working on these areas. Our goal is to make web data as accessible as an API - and we’re getting closer every day.

Get Started

  1. Try it Now

    • Get 500,000 free tokens in our playground
    • See examples and experiment with different prompts
    • No credit card required
  2. Build Something Real

Ready to turn web data into your competitive advantage? Get started in less than 5 minutes.

— Caleb, Eric, Nick and the Firecrawl team 🔥

Ready to Build?

Start scraping web data for your AI apps today.
No credit card needed.

About the Author

Eric Ciarla image
Eric Ciarla@ericciarla

Eric Ciarla is the Chief Operating Officer (COO) of Firecrawl and leads marketing. He also worked on Mendable.ai and sold it to companies like Snapchat, Coinbase, and MongoDB. Previously worked at Ford and Fracta as a Data Scientist. Eric also co-founded SideGuide, a tool for learning code within VS Code with 50,000 users.

More articles by Eric Ciarla

How to Create an llms.txt File for Any Website

Learn how to generate an llms.txt file for any website using the llms.txt Generator and Firecrawl.

Announcing Firestarter, our open source tool that turns any website into a chatbot

Spin up a fully functional RAG chatbot from any website URL using Firecrawl and Upstash—clean markdown in, OpenAI-compatible API out, all in under a minute.

Building Fire Enrich, our open source data enrichment tool

See how we built Fire Enrich, an open source tool that uses Firecrawl, OpenAI, and a multi-agent system to automate data enrichment — fully transparent, extensible, and built for developers.

Cloudflare Error 1015: How to solve it?

Cloudflare Error 1015 is a rate limiting error that occurs when Cloudflare detects that you are exceeding the request limit set by the website owner.

Build an agent that checks for website contradictions

Using Firecrawl and Claude to scrape your website's data and look for contradictions.

Why Companies Need a Data Strategy for Generative AI

Learn why a well-defined data strategy is essential for building robust, production-ready generative AI systems, and discover practical steps for curation, maintenance, and integration.

Getting Started with OpenAI's Predicted Outputs for Faster LLM Responses

A guide to leveraging Predicted Outputs to speed up LLM tasks with GPT-4o models.

How to easily install requests with pip and python

A tutorial on installing the requests library in Python using various methods, with usage examples and troubleshooting tips