Create audio translation request
Translates audio into English
Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Audio file upload or public HTTP/HTTPS URL. Supported formats: .wav, .mp3, .m4a, .webm, .flac, .ogg, .opus, .aac. Maximum duration 4 hours; longer audio is rejected with audio_too_long. Binary uploads are additionally capped at 500 MB (HTTP 413); URL-fetched audio is capped at 1 GB.
Model to use for translation
openai/whisper-large-v3 Target output language. Optional ISO 639-1 language code. If omitted, language is set to English.
"en"
Optional text to bias decoding. Supported only on Whisper-family models (e.g. openai/whisper-large-v3). Other STT models (e.g. nvidia/parakeet-tdt-0.6b-v3) accept the field for API compatibility but ignore it.
The format of the response
json, verbose_json Sampling temperature between 0.0 and 1.0
0 <= x <= 1Controls level of timestamp detail in verbose_json. Only used when response_format is verbose_json. Can be a single granularity or an array to get multiple levels.
segment, word ["word", "segment"]Response
OK
- Option 1
- Option 2
The translated text
"Hello, world!"