What are CSS selectors and XPath in web extraction?
TL;DR
CSS selectors and XPath are traditional methods for targeting specific HTML elements during scraping. CSS selectors use syntax like .price or #product-name, while XPath uses path expressions like //div[@class='price']. Both break when sites change their HTML—modern APIs like Firecrawl use AI to understand content semantically instead.
What are CSS selectors and XPath in web extraction?
CSS selectors and XPath are query languages for locating elements in HTML. CSS selectors use patterns like class names (.product-price) or IDs (#title) to find elements. XPath uses path-based notation to navigate HTML structure (//div[@class='product']/span[@class='price']). Traditional scrapers rely on these to extract data, but they’re brittle—sites change class names and your scraper breaks.
CSS selectors
CSS selectors target elements by class (.price), ID (#product-name), tag (h1), or attributes ([data-id="123"]). They’re simple for basic extraction: find all prices with .product-price, extract text.
Problem: websites change class names constantly. Today it’s .price, tomorrow it’s .product-cost. Your scraper breaks with every redesign. You’re maintaining selectors instead of using data.
XPath expressions
XPath is more powerful but complex. It navigates HTML structure using paths: //div[@class='product']/descendant::span[contains(@class, 'price')]. This finds nested elements precisely.
Problem: same brittleness. HTML structure changes—new divs added, classes renamed—and XPath expressions fail. Plus, XPath is harder to write and debug than CSS selectors.
Why traditional selectors break
Websites redesign frequently. Marketing teams change class names, developers refactor HTML, A/B tests alter structure. Every change breaks your selectors. You spend more time fixing scrapers than using data.
E-commerce sites are worst—they change layouts constantly. Your price scraper works today, fails tomorrow. Multiply this across dozens of competitor sites and maintenance becomes impossible.
The modern alternative
Firecrawl uses AI extraction that understands content semantically. Instead of targeting .product-price, you specify “extract price” in your schema. The AI identifies prices regardless of class names or HTML structure.
Sites redesign their HTML—your extraction keeps working. No selector maintenance, no broken scrapers. The AI adapts automatically to structural changes.
When selectors still make sense
For scraping your own controlled websites where HTML never changes, simple CSS selectors work fine. Internal tools, personal projects, static sites—basic extraction is adequate.
But for scraping external sites, competitor monitoring, or production applications—use AI extraction. The maintenance savings alone justify it.
Key Takeaways
CSS selectors and XPath are traditional methods for targeting HTML elements—they locate elements by class names, IDs, and structure. Both break when websites change their HTML, requiring constant maintenance. Modern APIs like Firecrawl use AI to understand content semantically instead of relying on brittle selectors. This makes extraction resilient to site redesigns. Use selectors only for static sites you control—everything else should use AI-powered extraction.
data from the web