Building Knowledge Graphs from Web Data using CAMEL-AI and Firecrawl
This post explores techniques for building knowledge graphs by extracting data from web pages using CAMEL-AI and Firecrawl.
Weโll cover:
- Multi-agent role-playing task setup
- Web scraping implementation
- Knowledge graph construction
- Agent monitoring techniques
To demonstrate these concepts, weโll build a knowledge graph to analyze Yusuf Dikecโs performance in the 2024 Paris Olympics. The notebook version is here.
๐ซ Setting Up CAMEL and Firecrawl
To get started, install the CAMEL package with all its dependencies:
pip install camel-ai[all]==0.1.6.3
Next, set up your API keys for Firecrawl and OpenAI to enable interaction with external services.
API Keys
Youโll need to set up your API keys for both Firecrawl and OpenAI. This ensures that the tools can interact with external services securely.
Your can go to here to get free API Key from Firecrawl
import os
from getpass import getpass
# Prompt for the Firecrawl API key securely
firecrawl_api_key = getpass('Enter your API key: ')
os.environ["FIRECRAWL_API_KEY"] = firecrawl_api_key
openai_api_key = getpass('Enter your API key: ')
os.environ["OPENAI_API_KEY"] = openai_api_key
๐ Effortless Web Scraping with Firecrawl
Firecrawl simplifies web scraping and cleaning content from web pages. Hereโs an example of scraping content from a specific post on the CAMEL AI website:
from camel.loaders import Firecrawl
firecrawl = Firecrawl()
response = firecrawl.tidy_scrape(
url="https://www.camel-ai.org/post/crab"
)
print(response)
๐ ๏ธ Web Information Retrieval using CAMELโs RAG and Firecrawl
Letโs retrieve relevant information from a list of URLs using CAMELโs RAG model. Weโll define a function that uses Firecrawl for web scraping and CAMELโs AutoRetriever for retrieving the most relevant information based on a query:
from camel.configs import ChatGPTConfig
from camel.models import ModelFactory
from camel.retrievers import AutoRetriever
from camel.toolkits import OpenAIFunction, SearchToolkit
from camel.types import ModelPlatformType, ModelType, StorageType
def retrieve_information_from_urls(urls: list[str], query: str) -> str:
r"""Retrieves relevant information from a list of URLs based on a given
query.
This function uses the `Firecrawl` tool to scrape content from the
provided URLs and then uses the `AutoRetriever` from CAMEL to retrieve the
most relevant information based on the query from the scraped content.
Args:
urls (list[str]): A list of URLs to scrape content from.
query (str): The query string to search for relevant information.
Returns:
str: The most relevant information retrieved based on the query.
Example:
>>> urls = ["https://example.com/article1", "https://example.com/
article2"]
>>> query = "latest advancements in AI"
>>> result = retrieve_information_from_urls(urls, query)
"""
aggregated_content = ''
# Scrape and aggregate content from each URL
for url in urls:
scraped_content = Firecrawl().tidy_scrape(url)
aggregated_content += scraped_content
# Initialize the AutoRetriever for retrieving relevant content
auto_retriever = AutoRetriever(
vector_storage_local_path="local_data", storage_type=StorageType.QDRANT
)
# Retrieve the most relevant information based on the query
# You can adjust the top_k and similarity_threshold value based on your needs
retrieved_info = auto_retriever.run_vector_retriever(
query=query,
contents=aggregated_content,
top_k=3,
similarity_threshold=0.5,
)
return retrieved_info
Letโs put the retrieval function to the test by gathering some information about the 2024 Olympics. The first run may take about 50 seconds as it needs to build a local vector database
retrieved_info = retrieve_information_from_urls(
query="Which country won the most golden prize in 2024 Olympics?",
urls=[
"https://en.wikipedia.org/wiki/2024_Summer_Olympics",
"https://olympics.com/en/paris-2024",
],
)
print(retrieved_info)
๐ Thanks to CAMELโs RAG pipeline and Firecrawlโs tidy scraping capabilities, this function effectively retrieves relevant information from the specified URLs! You can now integrate this function into CAMELโs Agents to automate the retrieval process further.
๐น Monitoring AI Agents with AgentOps
AgentOps is a powerful tool for tracking and analyzing the execution of CAMEL agents. To set up AgentOps, obtain an API key and configure it in your environment:
import os
from getpass import getpass
agentops_api_key = getpass('Enter your API key: ')
os.environ["AGENTOPS_API_KEY"] = agentops_api_key
import agentops
agentops.init(default_tags=["CAMEL"])
With AgentOps set up, you can monitor and analyze the execution of your CAMEL agents, gaining valuable insights into their performance and behavior.
๐ง Constructing Knowledge Graphs
CAMEL can build and store knowledge graphs from text data, enabling advanced analysis and visualization of relationships. Hereโs how to set up a Neo4j instance and define a function to create a knowledge graph:
from camel.storages import Neo4jGraph
from camel.loaders import UnstructuredIO
from camel.agents import KnowledgeGraphAgent
from camel.storages import Neo4jGraph
from camel.loaders import UnstructuredIO
from camel.agents import KnowledgeGraphAgent
def knowledge_graph_builder(text_input: str) -> None:
r"""Build and store a knowledge graph from the provided text.
This function processes the input text to create and extract nodes and relationships,
which are then added to a Neo4j database as a knowledge graph.
Args:
text_input (str): The input text from which the knowledge graph is to be constructed.
Returns:
graph_elements: The generated graph element from knowlegde graph agent.
"""
# Set Neo4j instance
n4j = Neo4jGraph(
url="Your_URI",
username="Your_Username",
password="Your_Password",
)
# Initialize instances
uio = UnstructuredIO()
kg_agent = KnowledgeGraphAgent()
# Create an element from the provided text
element_example = uio.create_element_from_text(text_input, element_id="001")
# Extract nodes and relationships using the Knowledge Graph Agent
graph_elements = kg_agent.run(element_example, parse_graph_elements=True)
# Add the extracted graph elements to the Neo4j database
n4j.add_graph_elements(graph_elements=[graph_elements])
return graph_elements
๐ค๐ค Multi-Agent Role-Playing with CAMEL
CAMEL enables role-playing sessions where AI agents interact to accomplish tasks using various tools. Letโs guide an assistant agent to perform a comprehensive study of the Turkish shooter in the 2024 Paris Olympics:
- Define the task prompt.
- Configure the assistant agent with tools for web information retrieval and knowledge graph building.
- Initialize the role-playing session.
- Start the interaction between agents.
from typing import List
from colorama import Fore
from camel.agents.chat_agent import FunctionCallingRecord
from camel.societies import RolePlaying
from camel.utils import print_text_animated
from camel.societies import RolePlaying
task_prompt = """Do a comprehensive study of the Turkish shooter in 2024 paris
olympics, write a report for me, then create a knowledge graph for the report.
You should use search tool to get related urls first, then use retrieval tool
to get the retrieved content back, finally use tool to create the
knowledge graph to finish the task."""
retrieval_tool = OpenAIFunction(retrieve_information_from_urls)
search_tool = OpenAIFunction(SearchToolkit().search_duckduckgo)
knowledge_graph_tool = OpenAIFunction(knowledge_graph_builder)
tool_list = [
retrieval_tool,
search_tool,
knowledge_graph_tool,
]
assistant_model_config = ChatGPTConfig(
tools=tool_list,
temperature=0.0,
)
role_play_session = RolePlaying(
assistant_role_name="CAMEL Assistant",
user_role_name="CAMEL User",
assistant_agent_kwargs=dict(
model=ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O,
model_config_dict=assistant_model_config.as_dict(),
),
tools=tool_list,
),
user_agent_kwargs=dict(),
task_prompt=task_prompt,
with_task_specify=False,
)
input_msg = role_play_session.init_chat()
while n < 10:
n += 1
assistant_response, user_response = role_play_session.step(input_msg)
if "CAMEL_TASK_DONE" in user_response.msg.content:
break
input_msg = assistant_response.msg
Now we can set up the role playing session with this:
# Initialize the role-playing session
role_play_session = RolePlaying(
assistant_role_name="CAMEL Assistant",
user_role_name="CAMEL User",
assistant_agent_kwargs=dict(
model=ModelFactory.create(
model_platform=ModelPlatformType.OPENAI,
model_type=ModelType.GPT_4O_MINI,
model_config_dict=assistant_model_config.as_dict(),
),
tools=tool_list,
),
user_agent_kwargs=dict(),
task_prompt=task_prompt,
with_task_specify=False,
)
Print the system message and task prompt like this:
print(
Fore.GREEN
+ f"AI Assistant sys message:\n{role_play_session.assistant_sys_msg}\n"
)
print(Fore.BLUE + f"AI User sys message:\n{role_play_session.user_sys_msg}\n")
print(Fore.YELLOW + f"Original task prompt:\n{task_prompt}\n")
print(
Fore.CYAN
+ "Specified task prompt:"
+ f"\n{role_play_session.specified_task_prompt}\n"
)
print(Fore.RED + f"Final task prompt:\n{role_play_session.task_prompt}\n")
Set the termination rule and start the interaction between agents:
NOTE: This session will take approximately 5 minutes and will consume around $0.02 in tokens by using GPT4o-mini.
n = 0
input_msg = role_play_session.init_chat()
while n < 10: # Limit the chat to 10 turns
n += 1
assistant_response, user_response = role_play_session.step(input_msg)
if assistant_response.terminated:
print(
Fore.GREEN
+ (
"AI Assistant terminated. Reason: "
f"{assistant_response.info['termination_reasons']}."
)
)
break
if user_response.terminated:
print(
Fore.GREEN
+ (
"AI User terminated. "
f"Reason: {user_response.info['termination_reasons']}."
)
)
break
# Print output from the user
print_text_animated(
Fore.BLUE + f"AI User:\n\n{user_response.msg.content}\n",
0.01
)
# Print output from the assistant, including any function
# execution information
print_text_animated(Fore.GREEN + "AI Assistant:", 0.01)
tool_calls: List[FunctionCallingRecord] = [
FunctionCallingRecord(**call.as_dict())
for call in assistant_response.info['tool_calls']
]
for func_record in tool_calls:
print_text_animated(f"{func_record}", 0.01)
print_text_animated(f"{assistant_response.msg.content}\n", 0.01)
if "CAMEL_TASK_DONE" in user_response.msg.content:
break
input_msg = assistant_response.msg
End the AgentOps Session like so:
# End the AgentOps session
agentops.end_session("Success")
๐ Highlights
This blog demonstrates the power of CAMEL and Firecrawl for Advanced RAG with Knowledge Graphs. Key tools utilized include:
- CAMEL: A multi-agent framework for Retrieval-Augmented Generation and role-playing scenarios.
- Firecrawl: A web scraping tool for extracting and cleaning content from web pages.
- AgentOps: A monitoring and analysis tool for tracking CAMEL agent execution.
- Qdrant: A vector storage system used with CAMELโs AutoRetriever.
- Neo4j: A graph database for constructing and storing knowledge graphs.
- DuckDuckGo Search: Utilized within the SearchToolkit to gather relevant URLs.
- OpenAI: Provides state-of-the-art language models for tool-calling and embeddings.
We hope this blog post has inspired you to harness the power of CAMEL and Firecrawl for your own projects. Happy researching and building! If you want to run this blog post as a notebook, click here!
Ready to Build?
Start scraping web data for your AI apps today.
No credit card needed.