Fine-tuning adapts a pretrained model to a smaller, targeted dataset so it performs better on a specific task or domain. Together AI handles every step of the process, from data preparation to hosting the resulting model for inference.Documentation Index
Fetch the complete documentation index at: https://docs.together.ai/llms.txt
Use this file to discover all available pages before exploring further.
When to fine-tune
Use fine-tuning when:- Prompting alone does not give you the behavior you need.
- You have a well-defined task with labeled examples (hundreds to thousands).
- You want lower latency or cost than routing every request to a large general model.
- You need the model to understand a private domain: your data, your terminology, or your output format.
Fine-tuning types
Together supports two modes of supervised fine-tuning:- LoRA (Low-Rank Adaptation). Trains a small set of adapter weights on top of the frozen base model. Faster, cheaper, and the right choice for most use cases. The fine-tuning API defaults to LoRA. See LoRA training and inference for details.
- Full fine-tuning. Updates every weight in the base model. Uses more compute, but can outperform LoRA on tasks where the base behavior needs to shift substantially.
Specialized fine-tuning
Beyond standard supervised fine-tuning, Together supports:- Preference fine-tuning. Align a model with rankings over preferred and dispreferred responses.
- Function-calling fine-tuning. Train a model to call tools reliably.
- Reasoning fine-tuning. Train a model to produce chain-of-thought outputs.
- Vision-language fine-tuning. Fine-tune vision-language models (VLMs) on image-text pairs.
- Bring your own model. Upload a base model from outside the Together catalog and fine-tune it.
Data and models
- Data preparation. Supported formats (conversational, instruction, preference, generic text), JSONL and Parquet schemas, and validation rules.
- Fine-tuning models. Every base model available for fine-tuning, along with the modes each one supports.
Pricing
Fine-tuning is billed per token of training data, scaled by model size and run type. Once training finishes, inference runs on a dedicated endpoint. See fine-tuning pricing for current rates.Next steps
- Run through the fine-tuning quickstart end-to-end.
- Check the fine-tuning FAQ for common questions.