Introducing Spark 1 Pro and Spark 1 Mini models in /agent. Try it now →

How can I extract data from tables, lists, and nested HTML structures?

TL;DR

Firecrawl’s AI automatically extracts data from tables, lists, and nested HTML structures. Define your schema with arrays and nested objects—Firecrawl identifies the structure and extracts clean JSON. No manual table parsing or HTML navigation needed.

How can I extract data from tables, lists, and nested HTML structures?

Firecrawl handles complex HTML structures automatically using AI extraction. For tables, define fields matching column data—Firecrawl extracts rows as JSON arrays. For lists, specify array fields—it identifies list items regardless of HTML markup. For nested structures, use nested objects in your schema—Firecrawl preserves relationships and hierarchy without manual parsing.

Extracting from HTML tables

Tables contain structured data in rows and columns, but parsing them traditionally requires complex logic—identifying headers, handling colspan/rowspan, dealing with nested tables. Firecrawl’s AI understands table semantics.

Define a schema with an array of objects matching your table structure. Firecrawl extracts all rows automatically, preserving column relationships. Works with standard tables, dynamic JavaScript tables, and even poorly formatted HTML tables.

Extracting from lists

Lists appear as <ul>, <ol>, or even <div> elements styled as lists. Traditional scrapers need custom logic for each format. Firecrawl recognizes list patterns semantically.

Specify array fields in your schema—“extract product features” or “list team members.” Firecrawl identifies list items regardless of HTML markup and returns clean arrays. Handles bullet lists, numbered lists, and custom list implementations.

Handling nested structures

Real-world data is hierarchical—products with variants, companies with departments and employees, articles with sections and subsections. Traditional parsing requires recursive logic and careful HTML navigation.

Firecrawl’s AI handles nested structures naturally. Define nested objects in your schema—product.variants[].sizes[] or company.departments[].employees[]. The AI preserves hierarchy and relationships automatically, extracting complex nested data as properly structured JSON.

Example: E-commerce product with variants

A product page might have a table of specifications, a list of features, and nested size/color variants. With Firecrawl, define one schema:

{
  "name": "string",
  "specifications": [{ "key": "string", "value": "string" }],
  "features": ["string"],
  "variants": [{ "color": "string", "sizes": ["string"], "price": "number" }]
}

Firecrawl extracts everything—table rows become spec objects, feature list becomes array, variants preserve nested structure. One API call, complete structured data.

Why AI extraction beats manual parsing

Manual parsing requires identifying table headers, iterating rows, handling malformed HTML, dealing with dynamic content, and maintaining code for each site structure. Firecrawl does this automatically.

Sites change their table layouts, reorganize lists, and restructure nested data—your extraction keeps working. The AI adapts to structural variations without code changes.

Key Takeaways

Firecrawl extracts data from tables, lists, and nested HTML structures using AI that understands semantic patterns. Define schemas with arrays and nested objects—Firecrawl handles the parsing automatically. No manual table iteration, no list traversal code, no recursive HTML navigation. Works across different HTML implementations and survives structural changes. One schema extracts complex hierarchical data from any website layout.

FOOTER
The easiest way to extract
data from the web
. . .. ..+ .:. .. .. .:: +.. ..: :. .:..::. .. .. .--:::. .. ... .:. .. .. .:+=-::.:. . ...-.::. .. ::.... .:--+::..: ......:+....:. :.. .. ....... ::-=:::: ..:-:-...: .--..:: ......... .. . . . ..::-:-.. .-+-:::.. ...::::. .: ...::.:.. . -... ....: . . .--=+-::. :-=-:.... . .:..:: .:---:::::-::.... ..::........::=..... ...:-.. .:-=--+=-:. ..--:..=::.... . .:.. ..:---::::---=:::..:... ..........::::.:::::::-::.-.. ...::--==:. ..-::-+==-:... .-::....... ..--:. ..:=+==.---=-+-:::::::-.. . .....::......:: ::::-::.---=+-:..::-+==++X=-:. ..:-::-=-== ---.. .:.--::.. .:-==::=--X==-----====--::+:::+... ..-....-:..::-::=-=-:-::--===++=-==-----== X+=-:.::-==----+==+XX+=-::.:+--==--::. .:-+X=----+X=-=------===--::-:...:. .... ....::::...:-:-==+++=++==+++XX++==++--+-+==++++=-===+=---:-==+X:XXX+=-:-=-==++=-:. .:-=+=- -=X+X+===+---==--==--:..::...+....+ ..:::---.::.---=+==XXXXXXXX+XX++==++===--+===:+X+====+=--::--=+XXXXXXX+==++==+XX+=: ::::--=+++X++X+XXXX+=----==++.+=--::+::::+. ::.=... .:::-==-------=X+++XXXXXXXXXXX++==++.==-==-:-==+X++==+=-=--=++++X++:X:X+++X+-+X X+=---=-==+=+++XXXXX+XX=+=--=X++XXX==---::-+-::::.:..-..
Backed by
Y Combinator
LinkedinGithubYouTube
SOC II · Type 2
AICPA
SOC 2
X (Twitter)
Discord