Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Fine-tuning adapts a pretrained model to a smaller, targeted dataset so it performs better on a specific task or domain. Together AI handles every step of the process, from data preparation to hosting the resulting model for inference.

When to fine-tune

Use fine-tuning when:
  • Prompting alone does not give you the behavior you need.
  • You have a well-defined task with labeled examples (hundreds to thousands).
  • You want lower latency or cost than routing every request to a large general model.
  • You need the model to understand a private domain: your data, your terminology, or your output format.
If you only need factual grounding, try retrieval-augmented generation (RAG) before fine-tuning.

Fine-tuning types

Together supports two modes of supervised fine-tuning:
  • LoRA (Low-Rank Adaptation). Trains a small set of adapter weights on top of the frozen base model. Faster, cheaper, and the right choice for most use cases. The fine-tuning API defaults to LoRA. See LoRA training and inference for details.
  • Full fine-tuning. Updates every weight in the base model. Uses more compute, but can outperform LoRA on tasks where the base behavior needs to shift substantially.

Specialized fine-tuning

Beyond standard supervised fine-tuning, Together supports:

Data and models

  • Data preparation. Supported formats (conversational, instruction, preference, generic text), JSONL and Parquet schemas, and validation rules.
  • Fine-tuning models. Every base model available for fine-tuning, along with the modes each one supports.

Pricing

Fine-tuning is billed per token of training data, scaled by model size and run type. Once training finishes, inference runs on a dedicated endpoint. See fine-tuning pricing for current rates.

Next steps