Create, update and delete endpoints via the CLI
Options | Description |
---|---|
--model - TEXT | (required) The model to deploy |
--gpu [ h100 | a100 | l40 | l40s | rtx-6000] | (required) GPU type to use for inference |
--min-replicas - INTEGER | Minimum number of replicas to deploy |
--max-replicas - INTEGER | Maximum number of replicas to deploy |
--gpu-count - INTEGER | Number of GPUs to use per replica |
--display-name - TEXT | A human-readable name for the endpoint |
--no-prompt-cache | Disable the prompt cache for this endpoint |
--no-speculative-decoding | Disable speculative decoding for this endpoint |
--no-auto-start | Create the endpoint in STOPPED state instead of auto-starting it |
--wait | Wait for the endpoint to be ready after creation |
Options | Description |
---|---|
--model - TEXT | Filter hardware options by model |
--json | Print output in JSON format |
--available | Print only available hardware options (can only be used if model is passed in) |
Options | Description |
---|---|
--json | Print output in JSON format |
--min-replicas
and --max-replicas
must be specified together
Options | Description |
---|---|
--display-name - TEXT | A new human-readable name for the endpoint |
--min-replicas - INTEGER | New minimum number of replicas to maintain |
--max-replicas - INTEGER | New maximum number of replicas to scale up to |
Options | Description |
---|---|
--wait | Wait for the endpoint to start |
Options | Description |
---|---|
--wait | Wait for the endpoint to stop |
--min-replicas
and --max-replicas
must be specified together
Options | Description |
---|---|
--display-name - TEXT | A new human-readable name for the endpoint |
--min-replicas - INTEGER | New minimum number of replicas to maintain |
--max-replicas - INTEGER | New maximum number of replicas to scale up to |
Options | Description |
---|---|
--json | Print output in JSON format |
type [dedicated | serverless] | Filter by endpoint type |