, we’ve talked a lot about popular techniques for optimizing the performance and cost of AI applications, like response streaming or prompt caching. Today, I want to talk about something a bit different but equally important for building real AI apps. That is, structured, machine-readable outputs.
So far in most of the examples I’ve shared, we’ve been dealing with free-text responses from an AI model. The user asks a question, the model responds in natural language, and we just display that response to the user in some way. Fairly simple and straightforward. But what happens when we need the model to return data in a specific format (e.g., a JSON object) so that we can further process it programmatically later on? What if we need the model to extract specific fields from a text or image, populate a database entry, or trigger a subsequent action based on its response? In those cases, getting back a wall of text won’t be very convenient. 🤔
Happily, there are multiple solutions for this issue. There are two main approaches for obtaining structured, machine-readable outputs from an LLM: JSON Mode and Function Calling (also called tool use). These two are often confused with one another (which is to be expected since they both deal with structured outputs, duh), but they serve quite different purposes. On top of this, OpenAI has introduced a stricter variant of Function Calling called Structured Outputs, which takes schema enforcement one step further, as we’ll see. In this post, we’ll take a closer look at all three, understand how each one works under the hood, and figure out when to use each.
So, let’s take a look!
1. What is JSON Mode?
JSON Mode is the simpler approach for achieving machine-readable outputs from an LLM. It is essentially a parameter you can set in an API request to instruct the model to always return a valid JSON object. And that’s really all there is to it! Nonetheless, this simplicity comes at a cost, since there are no guarantees on the structure or schema of the JSON (remember we didn’t define any schema, field names, or types, or anything like this), just that it will be valid, parseable JSON.
For example, using OpenAI’s API in Python, we can enable JSON Mode by adding the parameter response_format={“type”: “json_object”} to our call to the model. More specifically, it would look something like this:
from openai import OpenAI
client = OpenAI(api_key=”your_api_key”)
response = client.chat.completions.create(
model=”gpt-4o-mini”,
response_format={“type”: “json_object”},
messages=[
{
“role”: “system”,
“content”: “You are a helpful assistant. Always respond in JSON format.”
},
{
“role”: “user”,
“content”: “Extract the name, age, and city from this text: ‘Maria is 32 years old and lives in Athens.'”
}
]
)
print(response.choices[0].message.content)
And the response would look something like this:
{
“name”: “Maria”,
“age”: 32,
“city”: “Athens”
}
And voilà! ✨ With just one simple parameter change, we get a valid JSON back every time. No need for string parsing or strange regex hacks.
There’s a catch, though. JSON Mode does guarantee that the output is valid JSON, but it does not guarantee a specific structure. If we run the same example multiple times, we may get slightly different field names or a slightly different structure each time. For example, one run might return “name” , and another “full_name”. That’s a problem if we’re trying to reliably extract specific fields programmatically.
Another thing is that beyond setting response_format={“type”: “json_object”}, it is a good practice to also always explicitly instruct the model to respond in JSON in the system prompt. In the example above, notice how we also added “Always respond in JSON format” in the system prompt. Without this, the model may return a valid JSON sometimes, but not always, since its behaviour may become unpredictable.
2. What is Function Calling?
Function Calling (or tool use) is a more advanced approach for getting structured, machine-readable outputs from an LLM. Instead of just asking the model to format its response as JSON, we define a specific schema. That is, we explicitly define a formal description of the structure we want the output to follow, and in this way, the model is more constrained to return data that matches that schema exactly. In other words, with Function Calling we define upfront what fields we expect, what types those fields should be, which are required, and which are not, and so on.
Here’s how the same extraction example would look using Function Calling:
from openai import OpenAI
import json
client = OpenAI(api_key=”your_api_key”)
# define the schema of the output we expect
tools = [
{
“type”: “function”,
“function”: {
“name”: “extract_person_info”,
“description”: “Extract personal information from a text”,
“parameters”: {
“type”: “object”,
“properties”: {
“name”: {
“type”: “string”,
“description”: “The full name of the person”
},
“age”: {
“type”: “integer”,
“description”: “The age of the person”
},
“city”: {
“type”: “string”,
“description”: “The city the person lives in”
}
},
“required”: [“name”, “age”, “city”]
}
}
}
]
response = client.chat.completions.create(
model=”gpt-4o-mini”,
tools=tools,
tool_choice={“type”: “function”, “function”: {“name”: “extract_person_info”}},
messages=[
{
“role”: “user”,
“content”: “Extract the name, age, and city from this text: ‘Maria is 32 years old and lives in Athens.'”
}
]
)
# parse the structured output
tool_call = response.choices[0].message.tool_calls[0]
result = json.loads(tool_call.function.arguments)
print(result)
And the output would look like this:
{
“name”: “Maria”,
“age”: 32,
“city”: “Athens”
}
The output for this example with Function Calling is identical to the one we got using JSON Mode. Nevertheless, the key difference is that, unlike JSON Mode, with Function Calling, the output is going to be consistent; it is going to always follow the exact defined schema, with consistent field names, types, and any other attributes we define on it.
🍨 DataCream is a newsletter offering stories and tutorials on AI, data, and tech. If you are interested in these topics, subscribe here!
Bonus: A little more on Function Calling
Before moving on to Structured Outputs, it’s worth pausing and elaborating some more on the original motivation and use behind Function Calling, which goes well beyond just getting structured outputs. Essentially, the concept of Function Calling is the foundation of agentic AI workflows. More specifically, in an agentic setup, the LLM is not just responding to a user’s question, but rather it is deciding which action to take next based on the user’s input.
For example, let’s imagine a customer support assistant that can either look up an order, issue a refund, or escalate to a human agent, depending on what the user is asking. With Function Calling, we can define all three of these candidate actions as “tools” (functions), and the model’s output will define which one to call and with what arguments based on its input.
tools = [
{
“type”: “function”,
“function”: {
“name”: “lookup_order”,
“description”: “Look up the status of a customer order”,
“parameters”: {
“type”: “object”,
“properties”: {
“order_id”: {“type”: “string”, “description”: “The order ID”}
},
“required”: [“order_id”]
}
}
},
{
“type”: “function”,
“function”: {
“name”: “issue_refund”,
“description”: “Issue a refund for a customer order”,
“parameters”: {
“type”: “object”,
“properties”: {
“order_id”: {“type”: “string”},
“reason”: {“type”: “string”}
},
“required”: [“order_id”, “reason”]
}
}
}
]
response = client.chat.completions.create(
model=”gpt-4o-mini”,
tools=tools,
messages=[
{“role”: “user”, “content”: “I want a refund for order #12345, it arrived broken.”}
]
)
tool_call = response.choices[0].message.tool_calls[0]
print(tool_call.function.name) # “issue_refund”
print(tool_call.function.arguments) # ‘{“order_id”: “12345”, “reason”: “arrived broken”}’
So, the API response object looks something like this:
ChatCompletionMessage(
content=None,
role=’assistant’,
tool_calls=[
ChatCompletionMessageToolCall(
id=’call_abc123′,
type=’function’,
function=Function(
name=’issue_refund’,
arguments='{“order_id”: “12345”, “reason”: “arrived broken”}’
)
)
]
)
And the print statements would hypothetically output:
issue_refund
{“order_id”: “12345”, “reason”: “arrived broken”}
So, what is happening here? The model returns a tool_calls object instead of a regular text response (check out howcontent is None). Inside the tool_calls object, we can see that the model decided to call issue_refund (not lookup_order), and filled in the arguments on its own based on what the user said. We then parse those arguments and execute the actual refund logic in our system.
Notice how the model didn’t just return the requested data, but rather decided which of the candidate actions is the most appropriate to perform, then filled in the appropriate arguments in its response. In this way, we can then take those arguments and actually execute the corresponding action in our system. This is the real power of Function Calling, and it is why it is such a foundational component in agentic AI applications.
But let’s get back to machine-readable outputs now, and we’ll talk more about agentic AI workflows and Function Calling in some other post.
3. What about Structured Outputs?
A stricter variation of Function Calling is Structured Outputs. Even if Function Calling guides the model to provide an output following a defined schema, this is not really hard-constrained. In practice, this means that some deviations from this defined schema may still occur. Such deviations may be:
- A field marked as required that is, in fact, omitted if the model struggles to figure out its value
- Extra fields not defined in our schema are added
- A field defined as integer comes back as a string “32” instead of 32
…and so on.
This happens because, in Function Calling, the model is trying to follow the schema, but this is still a best-effort generation. Like any LLM output, the output here is still fundamentally tokens being predicted one by one, with the schema being just a strong hint. There’s still a good chance for that token-by-token generation to be derailed somewhere along the route and produce outputs that deviate from the defined schema.
Structured Outputs, on the other hand, takes Function Calling one step further by guaranteeing that every field in the defined schema will always appear in the output exactly as defined, with no surprises, no missing or extra fields. The key differentiator is that OpenAI uses constrained decoding behind the scenes. This means that at each token step, the model is only allowed to generate tokens that keep the output valid according to the schema. In other words, the schema is enforced at the generation level, instead of just being requested through the system prompt.
OpenAI’s Structured Outputs can be activated by simply setting strict: true in the function definition:
tools = [
{
“type”: “function”,
“function”: {
“name”: “extract_person_info”,
“strict”: True, # enables Structured Outputs
“parameters”: {
“type”: “object”,
“properties”: {
“name”: {“type”: “string”},
“age”: {“type”: “integer”},
“city”: {“type”: “string”}
},
“required”: [“name”, “age”, “city”],
“additionalProperties”: False
}
}
}
]
But again, this comes at a cost. Structured Outputs is available on GPT-4o and later models, with older models falling back to JSON mode. Not every JSON structure is supported, and it may be a bit slower since OpenAI preprocesses the results.
Nevertheless, it is the strictest and safest way to enforce a specific schema for the model’s outputs with no room for deviation. For production systems where reliability and consistency really matter, this is generally the safest option.
But aren’t all these the same thing?
JSON Mode, Function Calling, and Structured Outputs might seem to do the same thing, since they all essentially get you JSON back from the model. Nonetheless, as we’ve already seen, they are meaningfully different in what they guarantee and what they are designed for. In particular:
- Schema enforcement: JSON Mode returns a valid JSON, but with no structural guarantees. Function Calling returns a valid JSON that matches a defined schema, following specific field names, types, and required fields, but deviations are still possible. Structured Outputs goes one step further, enforcing that schema at the generation level, rendering deviations impossible.
- Use case: JSON Mode is for cases where we need a machine-readable response but can live with a variable format. Function Calling was primarily designed for cases where the model needs to trigger an action or pass arguments to an external tool, thus is essentially the general case of machine-readable outputs. Structured Outputs is Function Calling with a reliability guarantee, making it ideal for production pipelines where we need consistency in outputs.
- Ease of setup: JSON Mode is the lightest option to set up; just a single parameter change with no schema definition. On the flip side, for Function Calling and Structured Outputs, we also need to think about and set up the JSON schema.
Having said that, OpenAI itself recommends always using Structured Outputs instead of JSON Mode whenever possible, as a general rule of thumb.
On my mind
Obtaining machine-readable outputs from LLMs and choosing the appropriate approach for doing so can make a huge difference in the reliability and maintainability of any AI application. Freetext responses are great for conversational interfaces, but the moment our LLM is a component in a larger system (like feeding data downstream, triggering actions, populating databases, etc.), structured responses are essential. JSON Mode, Function Calling, and Structured Outputs can provide such outputs, each at a different level of strictness. Like many decisions in AI engineering, the right choice depends on what you’re building and how much variability you can tolerate.
If you made it this far, you might find pialgorithms useful — a platform we’ve been building that helps teams securely manage organizational knowledge in one place.
Loved this post? Join me on 💌Substack and 💼LinkedIn
All images by the author, except mentioned otherwise.
