Skip to main content
POST
/
compute
/
clusters
Python
from together import Together

client = Together()

response = client.beta.clusters.create(
  cluster_name="my-gpu-cluster",
  region="us-central-8",
  gpu_type="H100_SXM",
  num_gpus=8,
  nvidia_driver_version="560",
  cuda_version="12.6",
  billint_type="ON_DEMAND",
)

print(response.cluster_id)
{
  "cluster_id": "<string>",
  "region": "<string>",
  "cluster_name": "<string>",
  "volumes": [
    {
      "volume_id": "<string>",
      "volume_name": "<string>",
      "size_tib": 123,
      "status": "<string>"
    }
  ],
  "control_plane_nodes": [
    {
      "node_id": "<string>",
      "status": "<string>",
      "host_name": "<string>",
      "num_cpu_cores": 123,
      "memory_gib": 123,
      "network": "<string>",
      "phase_transitions": [
        {
          "transition_time": "2023-11-07T05:31:56Z"
        }
      ]
    }
  ],
  "gpu_worker_nodes": [
    {
      "node_id": "<string>",
      "status": "<string>",
      "host_name": "<string>",
      "num_cpu_cores": 123,
      "num_gpus": 123,
      "memory_gib": 123,
      "networks": [
        "<string>"
      ],
      "phase_transitions": [
        {
          "transition_time": "2023-11-07T05:31:56Z"
        }
      ],
      "instance_id": "<string>",
      "latest_remediation": {
        "id": "<string>",
        "cluster_id": "<string>",
        "instance_id": "<string>",
        "reason": "<string>",
        "active_health_check_run_id": "<string>",
        "passive_health_check_event_id": "<string>",
        "requested_by": "<string>",
        "create_time": "2023-11-07T05:31:56Z",
        "reviewed_by": "<string>",
        "review_time": "2023-11-07T05:31:56Z",
        "review_comment": "<string>",
        "start_time": "2023-11-07T05:31:56Z",
        "end_time": "2023-11-07T05:31:56Z",
        "error_message": "<string>",
        "update_time": "2023-11-07T05:31:56Z",
        "instance_name": "<string>"
      },
      "slurm_worker_hostname": "<string>"
    }
  ],
  "kube_config": "<string>",
  "num_gpus": 123,
  "cuda_version": "<string>",
  "nvidia_driver_version": "<string>",
  "project_id": "<string>",
  "num_cpu_workers": 123,
  "phase_transitions": [
    {
      "transition_time": "2023-11-07T05:31:56Z"
    }
  ],
  "desired_preemptible_gpus": 123,
  "allocated_preemptible_gpus": 123,
  "add_ons": [
    {
      "name": "<string>",
      "add_on_type": "<string>",
      "config": {
        "dashboard": {
          "enabled": true
        },
        "ingress": {
          "enabled": true
        }
      },
      "state": {
        "dashboard": {},
        "ingress": {}
      }
    }
  ],
  "duration_hours": 123,
  "slurm_shm_size_gib": 123,
  "capacity_pool_id": "<string>",
  "reservation_start_time": "2023-11-07T05:31:56Z",
  "reservation_end_time": "2023-11-07T05:31:56Z",
  "install_traefik": true,
  "created_at": "2023-11-07T05:31:56Z",
  "oidc_config": {
    "issuer_url": "<string>",
    "client_id": "<string>",
    "username_claim": "<string>",
    "username_prefix": "<string>",
    "group_claim": "<string>",
    "group_prefix": "<string>",
    "ca_cert": "<string>"
  },
  "cluster_config": {
    "kubernetes_dashboard_enabled": true,
    "jumphost_enabled": true,
    "slurm_startup_scripts": {
      "worker_prolog": "<string>",
      "worker_epilog": "<string>",
      "controller_prolog": "<string>",
      "controller_epilog": "<string>",
      "login_init_script": "<string>",
      "nodeset_init_script": "<string>",
      "extra_slurm_conf": "<string>"
    },
    "ingress": {
      "enabled": true
    },
    "observability": {
      "enabled": true
    },
    "gpu_operator_version": "<string>"
  }
}

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Authorizations

Authorization
string
header
default:default
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

GPU Cluster create request

region
string
required

Region to create the GPU cluster in. Usable regions can be found from client.clusters.list_regions()

gpu_type
enum<string>
required

Type of GPU to use in the cluster

Available options:
H100_SXM,
H200_SXM,
RTX_6000_PCI,
L40_PCIE,
B200_SXM,
H100_SXM_INF
num_gpus
integer<int32>
required

Number of GPUs to allocate in the cluster. This must be multiple of 8. For example, 8, 16 or 24

cluster_name
string
required

Name of the GPU cluster.

billing_type
enum<string>
required

RESERVED billing types allow you to specify the duration of the cluster reservation via the duration_days field. ON_DEMAND billing types will give you ownership of the cluster until you delete it. SCHEDULED_CAPACITY billing types allow you to reserve capacity for a scheduled time window. You must specify the reservation_start_time and reservation_end_time with this request.

Available options:
RESERVED,
ON_DEMAND,
SCHEDULED_CAPACITY
cuda_version
string
required

CUDA version for this cluster. For example, 12.5

nvidia_driver_version
string
required

Nvidia driver version for this cluster. For example, 550. Only some combination of cuda_version and nvidia_driver_version are supported.

cluster_type
enum<string>

Type of cluster to create.

Available options:
KUBERNETES,
SLURM
duration_days
integer

Duration in days to keep the cluster running.

shared_volume
object

Inline configuration to create a shared volume with the cluster creation.

volume_id
string

ID of an existing volume to use with the cluster creation.

gpu_node_failover_enabled
boolean
default:false

Whether automated GPU node failover should be enabled for this cluster. By default, it is disabled.

auto_scaled
boolean
default:false
deprecated

Whether GPU cluster should be auto-scaled based on the workload. By default, it is not auto-scaled.

auto_scale_max_gpus
integer

Maximum number of GPUs to which the cluster can be auto-scaled up. This field is required if auto_scaled is true.

slurm_shm_size_gib
integer

Shared memory size in GiB for Slurm cluster. This field is required if cluster_type is SLURM.

capacity_pool_id
string

ID of the capacity pool to use for the cluster. This field is optional and only applicable if the cluster is created from a capacity pool.

reservation_start_time
string<date-time>

Reservation start time of the cluster. This field is required for SCHEDULED billing to specify the reservation start time for the cluster. If not provided, the cluster provisions immediately.

reservation_end_time
string<date-time>

Reservation end time of the cluster. This field is required for SCHEDULED billing to specify the reservation end time for the cluster.

install_traefik
boolean
default:false

Whether to install Traefik ingress controller in the cluster. This field is only applicable for Kubernetes clusters and is false by default.

slurm_image
string

Custom Slurm image for Slurm clusters.

oidc_config
object
project_id
string

Project ID for the cluster. If not set, the project from the request context is used.

acceptance_tests_params
object

AcceptanceTestsParams groups all GPU acceptance test options when enabled is true.

cluster_config
object
num_capacity_pool_gpus
integer<int32>

Number of GPUs to allocate from the capacity pool. Must be a multiple of 8 and not exceed num_gpus.

auto_scale
boolean

Whether to enable auto-scaling for the cluster. If true, the cluster will automatically scale the number of GPU worker nodes between num_gpus and auto_scale_max_gpus based on the workload.

num_preemptible_gpus
integer<int32>

Number of preemptible GPUs to request alongside on-demand capacity. Must be a multiple of 8. Preemptible nodes are cheaper but may be reclaimed when on-demand capacity is needed elsewhere; the system fulfills this asynchronously and surfaces the actual count in allocated_preemptible_gpus.

num_reserved_gpus
integer

Number of prepaid (PLG) reserved GPUs for this cluster. When omitted for RESERVED billing on create, the server defaults this to num_gpus.

add_ons
object[]

Add-ons to enable on the cluster at creation time.

Response

200 - application/json

OK

cluster_id
string
required
cluster_type
enum<string>
required

Type of cluster.

Available options:
KUBERNETES,
SLURM
region
string
required
gpu_type
enum<string>
required
Available options:
H100_SXM,
H200_SXM,
RTX_6000_PCI,
L40_PCIE,
B200_SXM,
H100_SXM_INF
cluster_name
string
required
volumes
object[]
required
status
enum<string>
required

Current status of the GPU cluster.

Available options:
WaitingForControlPlaneNodes,
WaitingForDataPlaneNodes,
WaitingForSubnet,
WaitingForSharedVolume,
InstallingDrivers,
RunningAcceptanceTests,
Paused,
OnDemandComputePaused,
Ready,
Degraded,
Deleting
control_plane_nodes
object[]
required
gpu_worker_nodes
object[]
required
kube_config
string
required
num_gpus
integer<int32>
required
cuda_version
string
required
nvidia_driver_version
string
required
project_id
string
required
num_cpu_workers
integer<int32>
required

Number of CPU-only worker nodes in the cluster.

phase_transitions
object[]
required

Cluster-level phase transition history.

desired_preemptible_gpus
integer<int32>
required

Customer's requested number of preemptible GPUs. Set on cluster create or update; persists until changed.

allocated_preemptible_gpus
integer<int32>
required

Actual number of preemptible GPUs currently allocated to the cluster. Updated asynchronously by the fulfillment and reclamation workers; may be less than desired_preemptible_gpus when capacity is constrained.

billing_type
enum<string>
required

Billing type for the cluster (RESERVED, ON_DEMAND, or SCHEDULED_CAPACITY).

Available options:
RESERVED,
ON_DEMAND,
SCHEDULED_CAPACITY
add_ons
object[]
required

Enabled add-ons on this cluster. Only add-ons with enabled=true in their config are returned.

duration_hours
integer
slurm_shm_size_gib
integer
capacity_pool_id
string
reservation_start_time
string<date-time>
reservation_end_time
string<date-time>
install_traefik
boolean
created_at
string<date-time>
oidc_config
object
cluster_config
object