Strategy

Guaranteed JSON from GPT-4o: How Structured Outputs Actually Work

GPT-4o's Structured Outputs feature enforces schema compliance at the token-generation level, eliminating retry loops and output parsers. Here is how it works and where it still falls short.

KytoAI & Automation Firm
·
April 3, 2026
·
2 min read

Key Takeaways

  • 1Structured Outputs uses constrained decoding to enforce JSON schema compliance at token generation time — the model cannot produce tokens that would violate the schema, so malformed output is impossible by construction.
  • 2Pass response_format: {type: 'json_schema', json_schema: {name: '...', schema: {...}, strict: true}} — without strict: true you only get best-effort formatting, not a guarantee.
  • 3Pydantic (Python) and Zod (JavaScript) both have first-class SDK integrations via beta.chat.completions.parse(), which returns a typed object instead of a raw string.
  • 4Structured Outputs does not guarantee semantic correctness — the model can produce a valid schema-conforming response that is still factually wrong.
  • 5Recursive schemas and large schemas with many variants can hit token-budget limits or degrade generation quality — benchmark on your specific schema before shipping.

The Old Problem: Prompting for Structure Never Actually Worked

The standard approach for the past two years was to append 'Return only valid JSON, no markdown, no prose' to every system prompt and then write a sanitizer that stripped code fences and re-parsed. It worked until it didn't.

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

Three problems in one response: markdown code fences, a string where you expected a number, and a trailing sentence. Your pipeline had to handle all three. Retry logic added latency. Regex added fragility. json_mode (introduced in late 2023) fixed the code-fence problem but did not guarantee field names, types, or structure.

What Structured Outputs Actually Guarantee

Structured Outputs, shipped in August 2024 for gpt-4o, uses constrained decoding. During token generation, the model's sampling is masked against the set of tokens that would be valid at each position in the JSON structure. Tokens that would produce a type mismatch, an unexpected key, or a malformed string are assigned zero probability and cannot be selected. This is a hard constraint enforced by the inference stack, not a soft constraint enforced by the prompt.

The guarantee is structural, not semantic. The model can still fill revenue with 0 when the correct value is 4200000. Schema conformance does not validate the content of a field, only its type and presence.

The Raw API: JSON Schema with strict: true

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

Important: additionalProperties: false is required when strict is true — the API returns 400 without it. The schema name is used for caching on OpenAI's side; it has no effect on output structure.

Python: Pydantic with beta.parse()

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

JavaScript: Zod with zodResponseFormat()

[@portabletext/react] Unknown block type "code", specify a component for it in the `components.types` prop

Limitations You Should Know

Strict mode schema restrictions: only object, array, string, number, integer, boolean, null, and enum are supported. anyOf/oneOf with many branches, $ref cycles, default values, and format keywords (e.g. date-time) are not supported — the API returns 400.

Refusal handling: if the model refuses the request, message.refusal will be set and parsed will be None. Always check: if message.refusal: raise ValueError(message.refusal)

Model version requirement: strict structured outputs require gpt-4o-2024-08-06 or gpt-4o-mini-2024-07-18 or later. Older model aliases may not support it.

Schema caching: OpenAI caches your schema on the first request. If you change the schema, change the name too — otherwise you may get responses validated against your old schema.

Preguntas Frecuentes

Does this work with older models?

No. You need gpt-4o-2024-08-06 or newer. Older models rely on 'JSON mode', which is just a suggestion, not a rule.

Do I still need to prompt the model to return JSON?

Stop doing that. Do not write 'return JSON'. Pass your Pydantic model into the response_format parameter. The API handles the rest.

What happens if the model refuses to answer my prompt?

The model triggers a refusal attribute. Check for it, handle it cleanly, and your app survives without throwing a parsing error.

OpenAIGPT-4oAI AutomationPythonData Extraction
Compartir artículo

Kyto

AI & Automation Firm

We design and build AI automations and business operating systems. Agency results + Academy sovereignty.

¿Listo para automatizar?

Construyamos Tu Sistema Operativo.

Reserva una demo gratis para ver cómo la automatización con IA puede transformar tus operaciones.

Reservar Demo Gratis