Introducing Parallel Agents - Run multiple /agent queries simultaneously. Read more →

Get started

Ready to build?

Start getting Web Data for free and scale seamlessly as your project expands. No credit card needed.

All Questions

Glossary/Web Extraction APIs/Questions

How do web extraction APIs handle structured output formats (JSON, CSV, XML)?

How to build an agent that summarizes a website quickly?

How do you extract structured data from unstructured HTML?

TL;DR

Extracting structured data from HTML involves parsing the DOM, locating elements, and mapping values to fields. Traditional CSS selectors work but break on site changes. Firecrawl Agent uses AI to find data semantically—define a schema or use natural language prompts.

How do you extract structured data from unstructured HTML?

HTML has tags but no consistent schema across sites. A price might be in <span class="price"> or plain text. Converting to {"price": 29.99} requires identifying and extracting data systematically.

Selector-based extraction is brittle—site redesigns break it. Schema-based AI extraction defines what you want, not where to find it:

result = app.scrape_url("https://example.com/product", {
    "formats": ["extract"],
    "extract": {
        "schema": {
            "type": "object",
            "properties": {
                "productName": {"type": "string"},
                "price": {"type": "number"}
            }
        }
    }
})

The AI understands "price" conceptually, not by CSS class names. Sites change HTML; extraction keeps working.

Key Takeaways

Selector-based extraction requires per-site maintenance. Firecrawl's AI-powered extraction understands content semantically, delivering structured JSON from any page without brittle selectors.

Last updated: Feb 09, 2026

FOOTER

The easiest way to extract
data from the web

                                                                                                                                                 
                                                                                                                                                 
                                                                                                                                                 
                                                                                                                                                 
                                                                                                                                                 
                                                                .     .                                                                          
                                                               ..     ..+                                                                        
                                                                      .:.                                                                        
                                                               ..     ..         .::                                                             
                                                               +..   ..:          :.                                                             
                                                             .:..::.  ..          ..                                                             
                                                             .--:::.  ..     ...  .:.           ..                                               
                                            ..               .:+=-::.:.     . ...-.::.         ..                                                
                                            ::....           .:--+::..: ......:+....:.     :.. ..                                                
                                            .......            ::-=::::     ..:-:-...:     .--..::          .........                            
                            ..  .             . .              ..::-:-..      .-+-:::..    ...::::.        .: ...::.:..                          
                       .  -... ....:           .   .            .--=+-::.      :-=-:....  .  .:..::      .:---:::::-::....                       
                       ..::........::=.....    ...:-..        .:-=--+=-:.       ..--:..=::.... . .:..  ..:---::::---=:::..:...                   
              ..........::::.:::::::-::.-..  ...::--==:.      ..-::-+==-:...      .-::.......   ..--:. ..:=+==.---=-+-:::::::-..                 
          . .....::......:: ::::-::.---=+-:..::-+==++X=-:.   ..:-::-=-== ---..   .:.--::..       .:-==::=--X==-----====--::+:::+...              
          ..-....-:..::-::=-=-:-::--===++=-==-----== X+=-:.::-==----+==+XX+=-::.:+--==--::.      .:-+X=----+X=-=------===--::-:...:. ....        
          ....::::...:-:-==+++=++==+++XX++==++--+-+==++++=-===+=---:-==+X:XXX+=-:-=-==++=-:.     .:-=+=- -=X+X+===+---==--==--:..::...+....+     
         ..:::---.::.---=+==XXXXXXXX+XX++==++===--+===:+X+====+=--::--=+XXXXXXX+==++==+XX+=: ::::--=+++X++X+XXXX+=----==++.+=--::+::::+. ::.=... 
         .:::-==-------=X+++XXXXXXXXXXX++==++.==-==-:-==+X++==+=-=--=++++X++:X:X+++X+-+X X+=---=-==+=+++XXXXX+XX=+=--=X++XXX==---::-+-::::.:..-..

Backed by

Y Combinator

Linkedin Github YouTube

SOC II · Type 2

AICPA

SOC 2

X (Twitter)

Discord

Products

Playground Agent Pricing Templates Changelog

Use Cases

AI Platforms Lead Enrichment SEO Teams Deep Research Competitive Intelligence

Documentation

Getting started API Reference Integrations Examples SDKs

Company

Blog Careers Firestarters Ambassadors Affiliates Compare Firecrawl Student program