DevDocs to LLM
Repository

DevDocs to LLM

A tool that allows you to crawl developer documentation, extract content, and process it into a format suitable for use with large language models (LLMs).

Scraper

Description

Overview

DevDocs to LLM is a tool that allows you to crawl developer documentation, extract content, and process it into a format suitable for use with large language models (LLMs) like ChatGPT. This enables you to create specialized assistants tailored to specific documentation sets.

Features

  • Web crawling with customizable options
  • Content extraction in Markdown format
  • Rate limiting to respect server constraints
  • Retry mechanism for failed scrapes
  • Export options:
    • Rentry.co for quick sharing
    • Google Docs for larger documents

Usage

  1. Set up the Firecrawl environment
  2. Crawl a website and generate a sitemap
  3. Extract content from crawled pages
  4. Export the processed content

Requirements

  • Firecrawl API key
  • Google Docs API credentials (optional, for Google Docs export)

Installation

This project is designed to run in a Jupyter notebook environment, particularly Google Colab. No local installation is required.

Configuration

Before running the notebook, you’ll need to set a few parameters:

  • sub_url: The URL of the documentation you want to crawl
  • limit: Maximum number of pages to crawl
  • scrape_option: Choose to scrape all pages or a specific number
  • num_pages: Number of pages to scrape if not scraping all
  • pages_per_minute: Rate limiting parameter
  • wait_time_between_chunks: Delay between scraping chunks
  • retry_attempts: Number of retries for failed scrapes

Related Templates

Explore more templates similar to this one

Playground

Zed.dev Crawl

The first step of many to create an LLM-friendly document for Zed's configuration.

/crawl
Playground

Developers.campsite.com Crawl

/crawl
Snippet

o3 mini Company Researcher

This Python script integrates SerpAPI, OpenAI's O3 Mini model, and Firecrawl to create a comprehensive company research tool. The workflow begins by using SerpAPI to search for company information, then leverages the O3 Mini model to intelligently select the most relevant URLs from search results, and finally employs Firecrawl's extraction API to pull detailed information from those sources. The code includes robust error handling, polling mechanisms for extraction results, and clear formatting of the output, making it an efficient tool for gathering structured company information based on specific user objectives.

o3 mini
Research
Snippet

o1 Web Crawler

o1
Crawler
Playground

Docs.google.com Scrape

/scrape
Playground

test

/scrape
Snippet

Llama 4 Maverick Web Extractor

This Python script integrates SerpAPI, Together AI's Llama 4 Maverick model (specifically "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"), and Firecrawl to extract structured company information. The workflow first uses SerpAPI to search for company data, then employs the Llama 4 model to intelligently select the most relevant URLs (prioritizing official sources and limiting to 3 URLs), and finally leverages Firecrawl's extraction API to pull detailed information from those sources. The code includes robust error handling, logging, and polling mechanisms to ensure reliable data extraction across the entire process.

Llama 4
Extractor
Snippet

Company Researcher with GPT 4.1

Search for company information with Firecrawl and GPT 4.1

/scrape