DevDocs to LLM

Description

Overview

DevDocs to LLM is a tool that allows you to crawl developer documentation, extract content, and process it into a format suitable for use with large language models (LLMs) like ChatGPT. This enables you to create specialized assistants tailored to specific documentation sets.

Features

Web crawling with customizable options
Content extraction in Markdown format
Rate limiting to respect server constraints
Retry mechanism for failed scrapes
Export options:
- Rentry.co for quick sharing
- Google Docs for larger documents

Usage

Set up the Firecrawl environment
Crawl a website and generate a sitemap
Extract content from crawled pages
Export the processed content

Requirements

Firecrawl API key
Google Docs API credentials (optional, for Google Docs export)

Installation

This project is designed to run in a Jupyter notebook environment, particularly Google Colab. No local installation is required.

Configuration

Before running the notebook, you’ll need to set a few parameters:

sub_url: The URL of the documentation you want to crawl
limit: Maximum number of pages to crawl
scrape_option: Choose to scrape all pages or a specific number
num_pages: Number of pages to scrape if not scraping all
pages_per_minute: Rate limiting parameter
wait_time_between_chunks: Delay between scraping chunks
retry_attempts: Number of retries for failed scrapes

Description

Overview

Features

Usage

Requirements

Installation

Configuration

Related Templates

Top Italian Restaurants in SF

Quotes.toscrape.com Scrape

Zed.dev Crawl

Developers.campsite.com Crawl

o3 mini Company Researcher

o1 Web Crawler

Docs.google.com Scrape

test