
Grok-2, the latest language model from x.ai, brings advanced language understanding capabilities to developers, enabling the creation of intelligent applications with ease. In this tutorial, we’ll walk you through setting up Grok-2, obtaining an API key, and then building a web crawler using Firecrawl to extract structured data from any website.
Part 1: Setting Up Grok-2
Before diving into coding, we need to set up Grok-2 and get an API key.
Step 1: Sign Up for an x.ai Account
To access the Grok-2 API, you’ll need an x.ai account.
- Visit the Sign-Up Page: Go to x.ai Sign-Up.
- Register: Fill out the registration form with your email and create a password.
- Verify Your Email: Check your inbox for a verification email from x.ai and click the link to verify your account.
Step 2: Fund Your Account
To use the Grok-2 API, your account must have funds.
- Access the Cloud Console: After logging in, you’ll be directed to the x.ai Cloud Console.
- Navigate to Billing: Click on the Billing tab in the sidebar.
- Add Payment Method: Provide your payment details to add credits to your account.
Step 3: Obtain Your API Key
With your account funded, you can now generate an API key.
- Go to API Keys: Click on the API Keys tab in the Cloud Console.
- Create a New API Key: Click on Create New API Key and give it a descriptive name.
- Copy Your API Key: Make sure to copy your API key now, as it won’t be displayed again for security reasons.
Note: Keep your API key secure and do not share it publicly.
Part 2: Building a Web Crawler with Grok-2 and Firecrawl
Now that Grok-2 is set up, let’s build a web crawler to extract structured data from websites.
Prerequisites
- Python 3.6+
- Firecrawl Python Library
- Requests Library
- dotenv Library
Install the required packages:
pip install firecrawl-py requests python-dotenv
Step 1: Set Up Environment Variables
Create a .env
file in your project directory to store your API keys securely.
GROK_API_KEY=your_grok_api_key
FIRECRAWL_API_KEY=your_firecrawl_api_key
Replace your_grok_api_key
and your_firecrawl_api_key
with your actual API keys.
Step 2: Initialize Your Script
Create a new Python script (e.g., web_crawler.py
) and start by importing the necessary libraries and loading your environment variables.
import os
import json
import requests
from dotenv import load_dotenv
from firecrawl import FirecrawlApp
# Load environment variables from .env file
load_dotenv()
# Retrieve API keys
grok_api_key = os.getenv("GROK_API_KEY")
firecrawl_api_key = os.getenv("FIRECRAWL_API_KEY")
# Initialize FirecrawlApp
app = FirecrawlApp(api_key=firecrawl_api_key)
Step 3: Define the Grok-2 API Interaction Function
We need a function to interact with the Grok-2 API.
def grok_completion(prompt):
url = "https://api.x.ai/v1/chat/completions"
headers = {
"Content-Type": "application/json",
"Authorization": f"Bearer {grok_api_key}"
}
data = {
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": prompt}
],
"model": "grok-2",
"stream": False,
"temperature": 0
}
response = requests.post(url, headers=headers, json=data)
response_data = response.json()
return response_data['choices'][0]['message']['content']
Step 4: Identify Relevant Pages on the Website
Define a function to find pages related to our objective.
def find_relevant_pages(objective, url):
prompt = f"Based on the objective '{objective}', suggest a 1-2 word search term to locate relevant information on the website."
search_term = grok_completion(prompt).strip()
map_result = app.map_url(url, params={"search": search_term})
return map_result.get("links", [])
Step 5: Extract Data from the Pages
Create a function to scrape the pages and extract the required data.
def extract_data_from_pages(links, objective):
for link in links[:3]: # Limit to top 3 links
scrape_result = app.scrape_url(link, params={'formats': ['markdown']})
content = scrape_result.get('markdown', '')
prompt = f"""Given the following content, extract the information related to the objective '{objective}' in JSON format. If not found, reply 'Objective not met'.
Content: {content}
Remember:
- Only return JSON if the objective is met.
- Do not include any extra text.
"""
result = grok_completion(prompt).strip()
if result != "Objective not met":
try:
data = json.loads(result)
return data
except json.JSONDecodeError:
continue # Try the next link if JSON parsing fails
return None
Step 6: Implement the Main Function
Combine everything into a main function.
def main():
url = input("Enter the website URL to crawl: ")
objective = input("Enter your data extraction objective: ")
print("\nFinding relevant pages...")
links = find_relevant_pages(objective, url)
if not links:
print("No relevant pages found.")
return
print("Extracting data from pages...")
data = extract_data_from_pages(links, objective)
if data:
print("\nData extracted successfully:")
print(json.dumps(data, indent=2))
else:
print("Could not find data matching the objective.")
if __name__ == "__main__":
main()
Step 7: Run the Script
Save your script and run it from the command line.
python web_crawler.py
Example Interaction:
Enter the website URL to crawl: https://example.com
Enter your data extraction objective: Retrieve the list of services offered.
Finding relevant pages...
Extracting data from pages...
Data extracted successfully:
{
"services": [
"Web Development",
"SEO Optimization",
"Digital Marketing"
]
}
Conclusion
In this tutorial, we’ve successfully set up Grok-2, obtained an API key, and built a web crawler using Firecrawl. This powerful combination allows you to automate the process of extracting structured data from websites, making it a valuable tool for various applications.
Next Steps
- Explore More Features: Check out the Grok-2 and Firecrawl documentation to learn about additional functionalities.
- Enhance Error Handling: Improve the script with better error handling and logging.
- Customize Data Extraction: Modify the extraction logic to suit different objectives or data types.
References

data from the web