Create audio translation request

# Docs for v1 can be found by changing the above selector ^ from together import Together import os client = Together( api_key=os.environ.get("TOGETHER_API_KEY"), ) file = open("audio.wav", "rb") response = client.audio.translations.create( model="openai/whisper-large-v3", file=file, language="es", ) print(response.text)

Authorizations

Authorization

string

header

default:default

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data

file

required

Audio file upload or public HTTP/HTTPS URL. Supported formats: .wav, .mp3, .m4a, .webm, .flac, .ogg, .opus, .aac. Maximum duration 4 hours; longer audio is rejected with audio_too_long. Binary uploads are additionally capped at 500 MB (HTTP 413); URL-fetched audio is capped at 1 GB.

model

enum<string>

default:openai/whisper-large-v3

Model to use for translation

Available options:

openai/whisper-large-v3

language

string

default:en

Target output language. Optional ISO 639-1 language code. If omitted, language is set to English.

Example:

"en"

prompt

string

Optional text to bias decoding. Supported only on Whisper-family models (e.g. openai/whisper-large-v3). Other STT models (e.g. nvidia/parakeet-tdt-0.6b-v3) accept the field for API compatibility but ignore it.

response_format

enum<string>

default:json

The format of the response

Available options:

json,

verbose_json

temperature

number<float>

default:0

Sampling temperature between 0.0 and 1.0

Required range: 0 <= x <= 1

timestamp_granularities

default:segment

Controls level of timestamp detail in verbose_json. Only used when response_format is verbose_json. Can be a single granularity or an array to get multiple levels.

Available options:

segment,

word

Example:

["word", "segment"]

Response

Option 1
Option 2

text

string

required

The translated text

Example:

"Hello, world!"