Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Create and manage dedicated endpoints for model inference.

Endpoint ID

Many commands require an ENDPOINT_ID to identify which endpoint to operate on. The endpoint ID is a unique identifier assigned when an endpoint is created, in the format:
endpoint-<uuid>
For example: endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462
The endpoint ID is different from the model name (e.g., meta-llama/Llama-3.3-70B-Instruct-Turbo) or the display name you set with --display-name.

Find your endpoint ID

To find your endpoint ID, you can:
  • Run the tg endpoints create command to create an endpoint. The endpoint ID is returned in the output.
  • Run the tg endpoints list command to list all your endpoints. The endpoint ID is displayed for each endpoint.
  • View the endpoint details page in the Together AI console.

Create

Create a new dedicated endpoint.
Shell
tg endpoints create \
  --model meta-llama/Llama-3.3-70B-Instruct-Turbo \
  --hardware 4x_nvidia_h100_80gb_sxm \
  --display-name "My Endpoint" \
  --wait

Parameters

FlagDescription
--model [string](required) The model to deploy
--hardware [string](required) GPU type to use for inference.

Use tg endpoints hardware to discover available gpu identifiers
--min-replicas [number]Minimum number of replicas to deploy
--max-replicas [number]Maximum number of replicas to deploy
--display-name [string]A human-readable name for the endpoint
--no-auto-startCreate the endpoint in STOPPED state instead of auto-starting it
--no-speculative-decodingDisable speculative decoding for this endpoint
--inactive-timeout [number]Number of minutes of inactivity after which the endpoint will be automatically stopped. Set to 0 to disable.
--availability-zone [string]Start endpoint in specified availability zone.

Use tg endpoints availability-zones to discover valid options.
--waitWait for the endpoint to be ready after creation

Hardware

List all hardware options (optionally filtered by model and availability).
tg endpoints hardware

Parameters

FlagDescription
--model [string]Filter hardware that is compatible with a given model.
--availableFilter for only hardware that is currently available.

Retrieve

Print details for a specific endpoint.
Shell
tg endpoints retrieve endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Update

Update the configuration of an existing endpoint.
Shell
tg endpoints update endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462 \
  --min-replicas 2 \
  --max-replicas 4 

Parameters

FlagDescription
--display-name [string]New human-readable name for the endpoint.
--min-replicas [number]New minimum number of replicas to maintain.
--max-replicas [number]New maximum number of replicas to scale up to.
--inactive-timeout [number]Number of minutes of inactivity after which the endpoint will be automatically stopped. Set to 0 to disable.
Note: Both --min-replicas and --max-replicas must be specified together

Start

Start a dedicated endpoint.
Shell
tg endpoints start endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Parameters

FlagDescription
--waitWait for the endpoint to start

Stop

Stop a dedicated endpoint.
Shell
tg endpoints stop endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Parameters

FlagDescription
--waitWait for the endpoint to stop

Delete

Delete a dedicated endpoint.
Shell
tg endpoints delete endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

List

List your dedicated endpoints.
Shell
tg endpoints list

Options

OptionsDescription
--usage-type [on-demand | reserved]Filter by usage type options.
--after [string]The cursor to start from.