Pricing - Together AI docs

Together AI bills fine-tuning by the total number of tokens processed across training and validation. The per-token rate depends on three factors: the model size bracket, the training method (supervised or DPO), and the implementation (LoRA or full fine-tuning). For current rates, see the together.ai/pricing. After training, hosting on a dedicated endpoint is billed separately by the minute.

How tokens are counted

The total tokens for a job is:

total_tokens = (n_epochs × tokens_per_training_dataset) + (n_evals × tokens_per_validation_dataset)

Tokenization runs as part of the job. The final token count and price are recorded once tokenization completes and appear on the fine-tuning jobs dashboard and in client.fine_tuning.retrieve(id=<JOB_ID>). If you disable packing, training tokens are computed as dataset_length × max_seq_length instead.

Estimate job cost

There are three ways to estimate the cost of a fine-tuning job before launching it:

CLI: When you submit a job with tg fine-tuning create, the CLI prints the estimated price and asks for confirmation before the job is submitted.
Web interface: On the new fine-tuning job page, the estimate appears once you select a model and dataset.
API/SDK: Call the estimate price endpoint with the same parameters you plan to submit to the create job endpoint. The response includes the estimated total price, the estimated training and evaluation token counts, your credit limit, and whether you are allowed to proceed:

import os

from together import Together

client = Together(api_key=os.environ.get("TOGETHER_API_KEY"))

estimate = client.fine_tuning.estimate_price(
    training_file="file-abc123",
    model="meta-llama/Meta-Llama-3.1-8B-Instruct-Reference",
    n_epochs=3,
    training_method={"method": "sft"},
    training_type={"type": "Lora", "lora_r": 8},
)

print(estimate)

The cost estimate is only available after your input datasets pass server-side validation.

Cancelled and early-stopped jobs

When a running job is cancelled or early-stopped, you pay for completed steps only.

Hosting charges

After training, your fine-tuned model can be served on a dedicated endpoint that bills per minute based on the hardware attached. These charges are separate from your fine-tuning job cost and continue until you stop or delete the endpoint. See deployment for the full setup and teardown flow.

Minimum spend

Fine-tuning jobs have a $4.00 minimum charge. Some models are exempt. See fine-tuning pricing for the current rates and exceptions.

​How tokens are counted

​Estimate job cost

​Cancelled and early-stopped jobs

​Hosting charges

​Minimum spend

How tokens are counted

Estimate job cost

Cancelled and early-stopped jobs

Hosting charges

Minimum spend