Create a GPU cluster
Create an Instant Cluster on Together’s high-performance GPU clusters. With features like on-demand scaling, long-lived resizable high-bandwidth shared DC-local storage, Kubernetes and Slurm cluster flavors, a REST API, and Terraform support, you can run workloads flexibly without complex infrastructure management.
Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
GPU Cluster create request
Region to create the GPU cluster in. Usable regions can be found from client.clusters.list_regions()
Type of GPU to use in the cluster
H100_SXM, H200_SXM, RTX_6000_PCI, L40_PCIE, B200_SXM, H100_SXM_INF Number of GPUs to allocate in the cluster. This must be multiple of 8. For example, 8, 16 or 24
Name of the GPU cluster.
RESERVED billing types allow you to specify the duration of the cluster reservation via the duration_days field. ON_DEMAND billing types will give you ownership of the cluster until you delete it. SCHEDULED_CAPACITY billing types allow you to reserve capacity for a scheduled time window. You must specify the reservation_start_time and reservation_end_time with this request.
RESERVED, ON_DEMAND, SCHEDULED_CAPACITY CUDA version for this cluster. For example, 12.5
Nvidia driver version for this cluster. For example, 550. Only some combination of cuda_version and nvidia_driver_version are supported.
Type of cluster to create.
KUBERNETES, SLURM Duration in days to keep the cluster running.
Inline configuration to create a shared volume with the cluster creation.
ID of an existing volume to use with the cluster creation.
Whether automated GPU node failover should be enabled for this cluster. By default, it is disabled.
Whether GPU cluster should be auto-scaled based on the workload. By default, it is not auto-scaled.
Maximum number of GPUs to which the cluster can be auto-scaled up. This field is required if auto_scaled is true.
Shared memory size in GiB for Slurm cluster. This field is required if cluster_type is SLURM.
ID of the capacity pool to use for the cluster. This field is optional and only applicable if the cluster is created from a capacity pool.
Reservation start time of the cluster. This field is required for SCHEDULED billing to specify the reservation start time for the cluster. If not provided, the cluster provisions immediately.
Reservation end time of the cluster. This field is required for SCHEDULED billing to specify the reservation end time for the cluster.
Whether to install Traefik ingress controller in the cluster. This field is only applicable for Kubernetes clusters and is false by default.
Custom Slurm image for Slurm clusters.
Project ID for the cluster. If not set, the project from the request context is used.
AcceptanceTestsParams groups all GPU acceptance test options when enabled is true.
Number of GPUs to allocate from the capacity pool. Must be a multiple of 8 and not exceed num_gpus.
Whether to enable auto-scaling for the cluster. If true, the cluster will automatically scale the number of GPU worker nodes between num_gpus and auto_scale_max_gpus based on the workload.
Number of preemptible GPUs to request alongside on-demand capacity. Must be a multiple of 8. Preemptible nodes are cheaper but may be reclaimed when on-demand capacity is needed elsewhere; the system fulfills this asynchronously and surfaces the actual count in allocated_preemptible_gpus.
Number of prepaid (PLG) reserved GPUs for this cluster. When omitted for RESERVED billing on create, the server defaults this to num_gpus.
Add-ons to enable on the cluster at creation time.
Response
OK
Type of cluster.
KUBERNETES, SLURM H100_SXM, H200_SXM, RTX_6000_PCI, L40_PCIE, B200_SXM, H100_SXM_INF Current status of the GPU cluster.
WaitingForControlPlaneNodes, WaitingForDataPlaneNodes, WaitingForSubnet, WaitingForSharedVolume, InstallingDrivers, RunningAcceptanceTests, Paused, OnDemandComputePaused, Ready, Degraded, Deleting Number of CPU-only worker nodes in the cluster.
Cluster-level phase transition history.
Customer's requested number of preemptible GPUs. Set on cluster create or update; persists until changed.
Actual number of preemptible GPUs currently allocated to the cluster. Updated asynchronously by the fulfillment and reclamation workers; may be less than desired_preemptible_gpus when capacity is constrained.
Billing type for the cluster (RESERVED, ON_DEMAND, or SCHEDULED_CAPACITY).
RESERVED, ON_DEMAND, SCHEDULED_CAPACITY Enabled add-ons on this cluster. Only add-ons with enabled=true in their config are returned.