Is selector-based web scraping dead in the era of LLM-based scraping?
Not dead, but increasingly impractical. CSS selectors and XPath only work reliably on static HTML. The problem is that almost no website is static anymore. Most modern sites use React, Vue, or Angular to render content dynamically, meaning the HTML structure you wrote your selector against can change with any deployment.
The result: selectors break constantly, requiring ongoing maintenance for every site you scrape. LLM-based extraction reads content by meaning, not by HTML position, so it adapts automatically when layouts change.
| Factor | Selector-Based | LLM-Based |
|---|---|---|
| Dynamic sites | Unreliable, often fails | Handles JavaScript-rendered content |
| Site changes | Breaks, needs manual fix | Adapts automatically |
| Multi-site scraping | Custom selectors per site | One schema across all sites |
| Speed | Very fast | Slightly slower (LLM inference) |
| Cost | Minimal | Token costs per page |
Selectors still make sense for one narrow case: scraping the same stable, internally-built page at very high volume where you control the HTML. Outside that, LLM-based extraction is the better default.
Firecrawl's agent endpoint uses LLMs to extract structured data from any page without writing a single selector. It handles dynamic content, layout changes, and multi-site schemas out of the box.
data from the web