Create and manage dedicated endpoints for model inference.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Endpoint ID
Many commands require anENDPOINT_ID to identify which endpoint to operate on. The endpoint ID is a unique identifier assigned when an endpoint is created, in the format:
endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462
The endpoint ID is different from the model name (e.g.,
meta-llama/Llama-3.3-70B-Instruct-Turbo) or the display name you set with --display-name.Find your endpoint ID
To find your endpoint ID, you can:- Run the
tg endpoints createcommand to create an endpoint. The endpoint ID is returned in the output. - Run the
tg endpoints listcommand to list all your endpoints. The endpoint ID is displayed for each endpoint. - View the endpoint details page in the Together AI console.
Create
Create a new dedicated endpoint.Shell
Parameters
| Flag | Description |
|---|---|
--model [string] | (required) The model to deploy |
--hardware [string] | (required) GPU type to use for inference. Use tg endpoints hardware to discover available gpu identifiers |
--min-replicas [number] | Minimum number of replicas to deploy |
--max-replicas [number] | Maximum number of replicas to deploy |
--display-name [string] | A human-readable name for the endpoint |
--no-auto-start | Create the endpoint in STOPPED state instead of auto-starting it |
--no-speculative-decoding | Disable speculative decoding for this endpoint |
--inactive-timeout [number] | Number of minutes of inactivity after which the endpoint will be automatically stopped. Set to 0 to disable. |
--availability-zone [string] | Start endpoint in specified availability zone. Use tg endpoints availability-zones to discover valid options. |
--wait | Wait for the endpoint to be ready after creation |
Hardware
List all hardware options (optionally filtered by model and availability).Parameters
| Flag | Description |
|---|---|
--model [string] | Filter hardware that is compatible with a given model. |
--available | Filter for only hardware that is currently available. |
Retrieve
Print details for a specific endpoint.Shell
Update
Update the configuration of an existing endpoint.Shell
Parameters
| Flag | Description |
|---|---|
--display-name [string] | New human-readable name for the endpoint. |
--min-replicas [number] | New minimum number of replicas to maintain. |
--max-replicas [number] | New maximum number of replicas to scale up to. |
--inactive-timeout [number] | Number of minutes of inactivity after which the endpoint will be automatically stopped. Set to 0 to disable. |
Note: Both--min-replicasand--max-replicasmust be specified together
Start
Start a dedicated endpoint.Shell
Parameters
| Flag | Description |
|---|---|
--wait | Wait for the endpoint to start |
Stop
Stop a dedicated endpoint.Shell
Parameters
| Flag | Description |
|---|---|
--wait | Wait for the endpoint to stop |
Delete
Delete a dedicated endpoint.Shell
List
List your dedicated endpoints.Shell
Options
| Options | Description |
|---|---|
--usage-type [on-demand | reserved] | Filter by usage type options. |
--after [string] | The cursor to start from. |