What are common parsing formats (JSON, CSV, XML, Markdown)?
TL;DR
Web extraction outputs data in formats suited to downstream use. JSON handles structured data for APIs and databases. CSV delivers tabular data for spreadsheets. Markdown provides clean text optimized for LLMs. Firecrawl outputs all formats—structured JSON via schemas or LLM-ready markdown.
What are common parsing formats?
JSON: Nested key-value pairs for APIs, databases, and applications. Firecrawl Agent returns JSON matching your schema.
CSV: Tabular rows and columns for spreadsheets and analytics. Best for flat data like product catalogs.
XML: Hierarchical markup for legacy systems and complex structures. Less common in modern scraping.
Markdown: Formatted text that's token-efficient and LLM-optimized. Firecrawl returns markdown by default, stripping boilerplate for AI applications.
| Use Case | Format |
|---|---|
| API/Database | JSON |
| Spreadsheets | CSV |
| AI/LLM | Markdown |
| Legacy systems | XML |
Key Takeaways
Match format to purpose: JSON for structured storage, CSV for analysis, Markdown for AI. Firecrawl outputs all formats—structured JSON via schemas, clean markdown for LLMs.
data from the web