Local Chat With Websites
Repository

Local Chat With Websites

Chat with any website on your local machine

Chatbot

Description

This is a direct fork of Jacob Lee’ fully local PDF chatbot replacing the chat with PDF functionality with chat with website support powered by Firecrawl. It is a simple chatbot that allows you to ask questions about a website by embedding it and running queries against the vector store using a local LLM and embeddings.

🦙 Ollama

You can run more powerful, general models outside the browser using Ollama’s desktop app. Users will need to download and set up then run the following commands to allow the site access to a locally running Gemma 2 instance:

Mac/Linux

$ OLLAMA_ORIGINS=https://webml-demo.vercel.app OLLAMA_HOST=127.0.0.1:11435 ollama serve

Then, in another terminal window:

$ OLLAMA_HOST=127.0.0.1:11435 ollama pull gemma2

Windows

$ set OLLAMA_ORIGINS=https://webml-demo.vercel.app
set OLLAMA_HOST=127.0.0.1:11435
ollama serve

Then, in another terminal window:

$ set OLLAMA_HOST=127.0.0.1:11435
ollama pull gemma2

🔥 Firecrawl

Additionally, you will need a Firecrawl API key for website embedding. Signing up for Firecrawl is easy and you get 500 credits free. Enter your API key into the box below the URL in the embedding form.

⚡ Stack

It uses the following:

  • Voy as the vector store, fully WASM in the browser.
  • Ollama.
  • LangChain.js to call the models, perform retrieval, and generally orchestrate all the pieces.
  • Transformers.js to run open source Nomic embeddings in the browser.
    • For higher-quality embeddings, switch to "nomic-ai/nomic-embed-text-v1" in app/worker.ts.
  • Firecrawl to scrape the webpages and deliver them in markdown format.

🔱 Forking

To run/deploy this yourself, simply fork this repo and install the required dependencies with yarn.

There are no required environment variables, but you can optionally set up LangSmith tracing while developing locally to help debug the prompts and the chain. Copy the .env.example file into a .env.local file:

# No environment variables required!

# LangSmith tracing from the web worker.
# WARNING: FOR DEVELOPMENT ONLY. DO NOT DEPLOY A LIVE VERSION WITH THESE
# VARIABLES SET AS YOU WILL LEAK YOUR LANGCHAIN API KEY.
NEXT_PUBLIC_LANGCHAIN_TRACING_V2="true"
NEXT_PUBLIC_LANGCHAIN_API_KEY=
NEXT_PUBLIC_LANGCHAIN_PROJECT=

Just make sure you don’t set this in production, as your LangChain API key will be public on the frontend!

🙏 Thank you!

Huge thanks to Jacob Lee and the other contributors of the repo for making this happen! Be sure to give him a follow on Twitter @Hacubu!

Related Templates

Explore more templates similar to this one

Playground

Zed.dev Crawl

The first step of many to create an LLM-friendly document for Zed's configuration.

/crawl
Playground

Developers.campsite.com Crawl

/crawl
Snippet

o3 mini Company Researcher

This Python script integrates SerpAPI, OpenAI's O3 Mini model, and Firecrawl to create a comprehensive company research tool. The workflow begins by using SerpAPI to search for company information, then leverages the O3 Mini model to intelligently select the most relevant URLs from search results, and finally employs Firecrawl's extraction API to pull detailed information from those sources. The code includes robust error handling, polling mechanisms for extraction results, and clear formatting of the output, making it an efficient tool for gathering structured company information based on specific user objectives.

o3 mini
Research
Snippet

o1 Web Crawler

o1
Crawler
Playground

Docs.google.com Scrape

/scrape
Playground

test

/scrape
Snippet

Llama 4 Maverick Web Extractor

This Python script integrates SerpAPI, Together AI's Llama 4 Maverick model (specifically "meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8"), and Firecrawl to extract structured company information. The workflow first uses SerpAPI to search for company data, then employs the Llama 4 model to intelligently select the most relevant URLs (prioritizing official sources and limiting to 3 URLs), and finally leverages Firecrawl's extraction API to pull detailed information from those sources. The code includes robust error handling, logging, and polling mechanisms to ensure reliable data extraction across the entire process.

Llama 4
Extractor
Snippet

Company Researcher with GPT 4.1

Search for company information with Firecrawl and GPT 4.1

/scrape