
Getting structured data from LLMs is super useful for developers integrating AI into their applications, enabling more reliable parsing and processing of model outputs.
OpenAI just released new versions of gpt-4o and gpt-4o-mini which include huge improvements for developers looking to get structured data from LLMs. With the introduction of Structured Outputs and JSON Strict Mode developers can now guarantee a JSON output 100% of the time when setting strict to true.
Figure 1: Structured Output Evaluation Scores from OpenAI’s latest models
Without further ado, let’s dig into how to use these latest models with and get reliable structured data from them.
How to use Structured Outputs with JSON Strict Mode
To demonstrate the power of these models, we can use JSON Strict mode to extract structured data from a web page. See the code on Github.
Prerequisites
Install the required libraries:
!pip install firecrawl-py openai
Step 1: Initialize the FirecrawlApp and OpenAI Client
from firecrawl import FirecrawlApp
from openai import OpenAI
firecrawl_app = FirecrawlApp(api_key='FIRECRAWL_API_KEY')
client = OpenAI(api_key='OPENAI_API_KEY')
Step 2: Scrape Data from a Web Page
url = 'https://mendable.ai'
scraped_data = firecrawl_app.scrape_url(url)
Step 3: Define the OpenAI API Request
messages = [
{
"role": "system",
"content": "You are a helpful assistant that extracts structured data from web pages."
},
{
"role": "user",
"content": f"Extract the headline and description from the following HTML content: {scraped_data['content']}"
}
]
response_format = {
"type": "json_schema",
"json_schema": {
"name": "extracted_data",
"strict": True,
"schema": {
"type": "object",
"properties": {
"headline": {
"type": "string"
},
"description": {
"type": "string"
}
},
"required": ["headline", "description"],
"additionalProperties": False
}
}
}
Step 4: Call the OpenAI API and Extract Structured Data
If you are wondering which models you can use with OpenAI’s structued output and JSON Strict mode it is both gpt-4o-2024-08-06 and gpt-4o-mini-2024-07-18.
chat_completion = client.chat.completions.create(
model="gpt-4o-2024-08-06",
messages=messages,
response_format=response_format
)
extracted_data = chat_completion.choices[0].message.content
print(extracted_data)
By following these steps, you can reliably extract structured data from web pages using OpenAI’s latest models with JSON Strict Mode.
That’s about it! In this article, we showed you how to use Structured Output with scraped web data, but the sky’s the limit when it comes to what you can build with reliable structured output from LLMs!
References

data from the web