Create

Create a new dedicated inference endpoint.

Usage

Shell
together endpoints create [MODEL] [GPU] [OPTIONS]

Example

Shell
together endpoints create \
--model mistralai/Mixtral-8x7B-Instruct-v0.1 \
--gpu h100 \
--gpu-count 2 \
--display-name "My Endpoint" \
--wait

Options

OptionsDescription
--model- TEXT(required) The model to deploy
--gpu [ h100 | a100 | l40 | l40s | rtx-6000](required) GPU type to use for inference
--min-replicas- INTEGERMinimum number of replicas to deploy
--max-replicas- INTEGERMaximum number of replicas to deploy
--gpu-count - INTEGERNumber of GPUs to use per replica
--display-name- TEXTA human-readable name for the endpoint
--no-prompt-cacheDisable the prompt cache for this endpoint
--no-speculative-decodingDisable speculative decoding for this endpoint
--no-auto-startCreate the endpoint in STOPPED state instead of auto-starting it
--waitWait for the endpoint to be ready after creation

Hardware

List all the hardware options, optionally filtered by model.

Usage

Shell
together endpoints hardware [OPTIONS]

Example

Shell
together endpoints hardware --model mistralai/Mixtral-8x7B-Instruct-v0.1

Options

OptionsDescription
--model- TEXTFilter hardware options by model
--jsonPrint output in JSON format
--availablePrint only available hardware options (can only be used if model is passed in)

Get

Print details for a specific endpoint.

Usage

Shell
together endpoints get [OPTIONS]

Example

Shell
together endpoints get endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

OptionsDescription
--jsonPrint output in JSON format

Update

Update an existing endpoint by listing the changes followed by the endpoint ID. You can find the endpoint ID by listing your dedicated endpoints.

Usage

Shell
together endpoints update [OPTIONS] ENDPOINT_ID

Example

Shell
together endpoints update --min-replicas 2 --max-replicas 4 endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

Note: Both --min-replicas and --max-replicas must be specified together
OptionsDescription
--display-name - TEXTA new human-readable name for the endpoint
--min-replicas - INTEGERNew minimum number of replicas to maintain
--max-replicas - INTEGERNew maximum number of replicas to scale up to

Start

Start a dedicated inference endpoint.

Usage

Shell
together endpoints start [OPTIONS] ENDPOINT_ID

Example

Shell
together endpoints start endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

OptionsDescription
--waitWait for the endpoint to start

Stop

Stop a dedicated inference endpoint.

Usage

Shell
together endpoints stop [OPTIONS] ENDPOINT_ID

Example

Shell
together endpoints stop endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

OptionsDescription
--waitWait for the endpoint to stop

Update

Usage

Update an existing endpoint by listing the changes followed by the endpoint ID. You can find the endpoint ID by listing your dedicated endpoints
Shell
together endpoints update [OPTIONS] ENDPOINT_ID

Example

Shell
together endpoints update --min-replicas 2 --max-replicas 4 endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

Options

Note: Both --min-replicas and --max-replicas must be specified together
OptionsDescription
--display-name - TEXTA new human-readable name for the endpoint
--min-replicas - INTEGERNew minimum number of replicas to maintain
--max-replicas - INTEGERNew maximum number of replicas to scale up to

Delete

Delete a dedicated inference endpoint.

Usage

Shell
together endpoints delete [OPTIONS] ENDPOINT_ID

Example

Shell
together endpoints delete endpoint-c2a48674-9ec7-45b3-ac30-0f25f2ad9462

List

Usage

Shell
together endpoints list [FLAGS]

Example

Shell
together endpoints list --type dedicated

Options

OptionsDescription
--jsonPrint output in JSON format
type [dedicated | serverless]Filter by endpoint type

Help

See all commands with
Shell
together endpoints --help