Firecrawl x Dify.AI: Add web data to your RAG applications
We are excited to announce that Firecrawl now integrates seamlessly with Dify, empowering your RAG (Retrieval-Augmented Generation) applications like never before!
Introducing Dify x Firecrawl
Dify is an open-source platform for developing LLM (Large Language Model) applications. It allows you to orchestrate everything from simple agents to complex AI workflows, all powered by a robust RAG engine. With Firecrawl, you can now add web data to your RAG applications - all inside Dify’s platform.
Ingesting Web Data
Firecrawl enables you to convert any website into clean, LLM-ready data effortlessly. To use it in Dify, configure your settings in the Knowledge dashboard. There you can set your Firecrawl API key and start crawing right away.
Easy Setup and Customization
To start using it, pass the URL you want to ingest by clicking the Sync from website button in the Knowledge section. Tailor your crawling process with options to set sub-page limits and depths. Use the Exclude and Include path options to ensure you capture exactly the data you need.
Data Embedding
Firecrawl rapidly crawls web pages in parallel. After crawling, you can select the desired web data on Dify for preprocessing and cleaning. The processed data is then embedded and stored in Dify’s vector DB as a new knowledge base.
Ready for production
Now, you can create a RAG app that uses web data as contextual knowledge on Dify with the power of Firecrawl. We value your feedback! Connect with us on X/Twitter @firecrawl_dev.
Ready to Build?
Start scraping web data for your AI apps today.
No credit card needed.
About the Author
Nicolas Camara is the Chief Technology Officer (CTO) at Firecrawl. He previously built and scaled Mendable, one of the pioneering "chat with your documents" apps, which had major Fortune 500 customers like Snapchat, Coinbase, and MongoDB. Prior to that, Nicolas built SideGuide, the first code-learning tool inside VS Code, and grew a community of 50,000 users. Nicolas studied Computer Science and has over 10 years of experience in building software.
More articles by Nicolas Camara
Using OpenAI's Realtime API and Firecrawl to Talk with Any Website
Build a real-time conversational agent that interacts with any website using OpenAI's Realtime API and Firecrawl.
Extract website data using LLMs
Learn how to use Firecrawl and Groq to extract structured data from a web page in a few lines of code.
Firecrawl x Dify.AI: Add web data to your RAG applications
Learn how Firecrawl integrates with Dify.AI to provide web data to AI applications.
Launch Week I / Day 6: LLM Extract (v1)
Extract structured data from your web pages using the extract format in /scrape.
Launch Week I / Day 7: Crawl Webhooks (v1)
New /crawl webhook support. Send notifications to your apps during a crawl.
OpenAI Swarm Tutorial: Create Marketing Campaigns for Any Website
A guide to building a multi-agent system using OpenAI Swarm and Firecrawl for AI-driven marketing strategies
Build a 'Chat with website' using Groq Llama 3
Learn how to use Firecrawl, Groq Llama 3, and Langchain to build a 'Chat with your website' bot.
Scrape and Analyze Airbnb Data with Firecrawl and E2B
Learn how to scrape and analyze Airbnb data using Firecrawl and E2B in a few lines of code.