This tutorial walks through a complete batch inference job from start to finish. By the end you’ll have uploaded a JSONL file of chat completion requests, run them as a single job at up to 50% off serverless rates, and reconciled the responses with your original inputs.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Prerequisites
Before you begin, make sure you have:- Created an account and generated an API key.
- Set
TOGETHER_API_KEYas an environment variable:export TOGETHER_API_KEY=<your-key>. See API keys and authentication for details. - Installed the Python or TypeScript SDK. Python examples require
together>=2.0.0.
Step 1: Prepare a JSONL input file
Each request lives on its own line in a JSONL file. A request has two fields: acustom_id you choose, and a body matching the schema of the endpoint you’re calling. The Batch API runs every line independently and stamps each output with the same custom_id, so this is how you’ll map results back to inputs at the end.
Save the following as batch_input.jsonl:
batch_input.jsonl
| Field | Type | Required | Description |
|---|---|---|---|
custom_id | string | Yes | Unique identifier for tracking (max 64 chars). |
method | string | Conditional | Set to "FILE" when batching /v1/audio/transcriptions or /v1/audio/translations so the worker dispatches each request as multipart/form-data. Omit for chat completion batches. See Run an audio transcription batch. |
body | object | Yes | Request body matching the endpoint’s schema. |
Step 2: Upload the file
Upload the JSONL file withpurpose="batch-api". The upload returns a file object whose id you’ll pass to the batch job in the next step.
Pass check=False to skip client-side validation. The server still validates the file during the VALIDATING phase, and skipping the client check is faster for large files without changing the error surface. With check=True (default), the SDK parses each JSONL line locally and raises TogetherException before uploading if a line is malformed.
The
file parameter accepts a local file path as a str or pathlib.Path. Passing an open file handle or a (filename, bytes) tuple is not supported.Step 3: Create the batch
Now hand the uploaded file’sid to the batch endpoint, along with the API endpoint each request should run against. For chat completion requests, that’s /v1/chat/completions. Audio batches use /v1/audio/transcriptions or /v1/audio/translations — see Run an audio transcription batch.
batches.create() returns a wrapper; the batch object lives at .job. batches.retrieve() (used in the next step) returns the batch object directly.Step 4: Poll for completion
The job moves throughVALIDATING, then IN_PROGRESS, then a terminal status: COMPLETED, FAILED, EXPIRED, or CANCELLED. Poll every 30 to 60 seconds until you hit a terminal status. Tighter loops will hit rate limits without giving the server time to make progress.
progress is a float from 0 to 100 representing the percentage of requests completed. It is present on all batch objects but may remain 0 while the job is in VALIDATING.Step 5: Retrieve the results
When the job reachesCOMPLETED, the batch object carries an output_file_id. Download that file and you’ll get one JSON object per line, each keyed by the custom_id from your input. Output line order does not match input line order, so use custom_id to reconcile.
error_file_id. Always check it: a batch can be COMPLETED and still contain individual request failures. See retrieve results and error files on the manage page.
Run an audio transcription batch
The Batch API also supports/v1/audio/transcriptions and /v1/audio/translations for audio workloads (for example, openai/whisper-large-v3). The upload, poll, and retrieve steps above are identical. Two things change:
1. Each JSONL line must include "method": "FILE". The audio endpoints expect multipart/form-data requests, so the worker uses the method field to choose its dispatch mode. Omitting it causes every line to fail with Content-Type must be multipart/form-data in the error file.
audio_batch.jsonl
body.file is the publicly-reachable URL of the audio clip; the worker fetches the audio at execution time. Optional fields such as response_format, language, and prompt pass through to the underlying API — see the audio transcriptions reference for the full schema.
2. Pass the audio endpoint when creating the batch.
/v1/audio/translations, swap the endpoint and use a translation-capable model — the JSONL line shape is the same.
Complete script
The full Python program combining all steps above:Python
Next steps
- Manage batch jobs: cancel, list, and download error files.
- Batch processing overview: rate limits, discounted models, best practices, and FAQ.