Introduction
Standard large language models respond to user queries by generating plain text. This is great for many applications like chatbots, but if you want to programmatically access details in the response, plain text is hard to work with. Some models have the ability to respond with structured JSON instead, making it easy to work with data from the LLM’s output directly in your application code. If you’re using a supported model, you can enable structured responses by providing your desired schema details to theresponse_format key of the Chat Completions API.
Supported models
The following newly released top models support JSON mode:openai/gpt-oss-120bopenai/gpt-oss-20bmoonshotai/Kimi-K2-Instructzai-org/GLM-4.5-Air-FP8Qwen/Qwen3-Next-80B-A3B-InstructQwen/Qwen3-Next-80B-A3B-ThinkingQwen/Qwen3-235B-A22B-Thinking-2507Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8Qwen/Qwen3-235B-A22B-Instruct-2507-tputdeepseek-ai/DeepSeek-R1deepseek-ai/DeepSeek-R1-0528-tputdeepseek-ai/DeepSeek-V3meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8Qwen/Qwen2.5-72B-Instruct-TurboQwen/Qwen2.5-VL-72B-Instruct
meta-llama/Llama-4-Scout-17B-16E-Instructmeta-llama/Llama-3.3-70B-Instruct-Turbodeepcogito/cogito-v2-preview-llama-70Bdeepcogito/cogito-v2-preview-llama-109B-MoEdeepcogito/cogito-v2-preview-llama-405Bdeepcogito/cogito-v2-preview-deepseek-671bdeepseek-ai/DeepSeek-R1-Distill-Llama-70Bdeepseek-ai/DeepSeek-R1-Distill-Qwen-14Bmarin-community/marin-8b-instructmeta-llama/Meta-Llama-3.1-8B-Instruct-Turbometa-llama/Llama-3.3-70B-Instruct-Turbo-FreeQwen/Qwen2.5-7B-Instruct-TurboQwen/Qwen2.5-Coder-32B-InstructQwen/QwQ-32BQwen/Qwen3-235B-A22B-fp8-tputarcee-ai/coder-largemeta-llama/Meta-Llama-3.1-405B-Instruct-Turbometa-llama/Llama-3.2-3B-Instruct-Turbometa-llama/Meta-Llama-3-8B-Instruct-Litemeta-llama/Llama-3-70b-chat-hfgoogle/gemma-3n-E4B-itmistralai/Mistral-7B-Instruct-v0.1mistralai/Mistral-7B-Instruct-v0.2mistralai/Mistral-7B-Instruct-v0.3arcee_ai/arcee-spotlight
Basic example
Let’s look at a simple example, where we pass a transcript of a voice note to a model and ask it to summarize it. We want the summary to have the following structure:JSON
response_format key.
Finally – and this is important – we need to make sure to instruct our model to only respond in JSON format, and include details of the schema we want to use. This ensures it will actually use the schema we provide when generating its response. Any instructions in the schema itself will not be followed by the LLM.
Important: You must always instruct your model to only respond in JSON format, either in the system prompt or a user message, in addition to passing your schema to the response_format key.
Let’s see what this looks like:
JSON
Regex example
All the models supported for JSON mode also support regex mode. Here’s an example using it to constrain the classification.Reasoning model example
You can also extract structured outputs from some reasoning models such asDeepSeek-R1-0528.
Below we ask the model to solve a math problem step-by-step showing it’s work:
Python
JSON
Vision model example
Let’s look at another example, this time using a vision model. We want our LLM to extract text from the following screenshot of a Trello board:
In particular, we want to know the name of the project (Project A), and the number of columns in the board (4).
Let’s try it out:
JSON
Try out your code in the Together Playground
You can try out JSON Mode in the Together Playground to test out variations on your schema and prompt:
Just click the RESPONSE FORMAT dropdown in the right-hand sidebar, choose JSON, and upload your schema!