Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Using a coding agent? Install the together-audio skill to let your agent write correct speech-to-text code automatically. See agent skills for details.
Together AI hosts speech recognition models including OpenAI’s Whisper and Voxtral for batch transcription and real-time streaming.
Want to hear it in action? Call (847) 851-4323 to talk to a live voice agent powered by Together AI’s real-time STT and TTS pipeline. Then read the end-to-end guide to build your own.

Quickstart

Basic transcription and translation:
from pathlib import Path

from together import Together

client = Together()

## Basic transcription

response = client.audio.transcriptions.create(
    file=Path("audio.mp3"),
    model="openai/whisper-large-v3",
    language="en",
)
print(response.text)

## Basic translation

response = client.audio.translations.create(
    file=Path("foreign_audio.mp3"),
    model="openai/whisper-large-v3",
)
print(response.text)

Available models

For the current list of speech-to-text models, see the serverless catalog or the dedicated endpoint model catalog.

Limits

LimitValueNotes
Max file size (direct upload)500 MBRequests above this are rejected at the edge with HTTP 413 Payload Too Large.
Max file size (URL fetch)1 GBWhen you submit an HTTPS URL instead of binary, the server downloads up to 1 GB. Larger downloads fail with 400 file_too_large.
Max audio duration4 hours per requestLonger audio is rejected with 400 audio_too_long. Split into ≤ 4 h segments and submit separately.
Supported formats.wav, .mp3, .m4a, .webm, .flac, .ogg, .opus, .aac
For larger payloads, host the file at a public HTTPS URL and pass that URL as the file field instead of a binary upload — the 500 MB edge cap only applies to direct uploads. See Errors and troubleshooting for the full list of error codes.

Audio transcription

Audio transcription is speech-to-text in the same language as the source audio.
from pathlib import Path

from together import Together

client = Together()

response = client.audio.transcriptions.create(
    file=Path("meeting_recording.mp3"),
    model="openai/whisper-large-v3",
    language="en",
    response_format="json",
)

print(f"Transcription: {response.text}")
The API supports the following audio formats:
  • .wav (audio/wav)
  • .mp3 (audio/mpeg)
  • .m4a (audio/mp4)
  • .webm (audio/webm)
  • .flac (audio/flac)
  • .ogg (audio/ogg)
  • .opus (audio/opus)
  • .aac (audio/aac)

Input methods

Path object

Python
from pathlib import Path

response = client.audio.transcriptions.create(
    file=Path("recordings/interview.wav"),
    model="openai/whisper-large-v3",
)

File-like object

Python
with open("audio.mp3", "rb") as audio_file:
    response = client.audio.transcriptions.create(
        file=audio_file,
        model="openai/whisper-large-v3",
    )

Remote URL

The Python SDK doesn’t accept a string URL on file=. To transcribe a remote file, download it first or use the CLI:
Shell
together audio transcribe https://example.com/audio.mp3 \
  --model openai/whisper-large-v3

Language support

Specify the audio language using ISO 639-1 language codes:
from pathlib import Path

response = client.audio.transcriptions.create(
    file=Path("spanish_audio.mp3"),
    model="openai/whisper-large-v3",
    language="es",  # Spanish
)
Common language codes:
  • "en": English.
  • "es": Spanish.
  • "fr": French.
  • "de": German.
  • "ja": Japanese.
  • "zh": Chinese.
  • "auto": Auto-detect (default).

Custom prompts

Use prompts to improve transcription accuracy for specific contexts.
Prompts are supported only on Whisper-family models (for example, openai/whisper-large-v3). Other STT models (for example, nvidia/parakeet-tdt-0.6b-v3) accept the field for API compatibility but ignore it.
from pathlib import Path

response = client.audio.transcriptions.create(
    file=Path("medical_consultation.mp3"),
    model="openai/whisper-large-v3",
    language="en",
    prompt="This is a medical consultation discussing patient symptoms, diagnosis, and treatment options.",
)

Next steps