Learn details on how to use your own private data to fine-tune a custom LLM.
prompt
and completion
pairs - please refer to the data format details here.
If it’s your first time fine-tuning, we recommend using an instruct model. Llama 3 8b instruct is great for simpler training sets, and the larger Llama 3 70b instruct is good for more complicated training sets.
You can find all available models on the Together API here.
attention_mask
and labels
should be set to 0 and -100, respectively, so that the model essentially ignores the padding tokens in prediction and excludes them in its loss.
JSONL is simpler and will work for many cases, while Parquet stores pre-tokenized data, providing flexibility to specify custom attention mask and labels (loss masking). It also saves you time for each job you run by skipping the tokenization step. View our file format guide to learn more about working with each format.
labels
field for your examples in the tokenized dataset (in a Parquet file), you can mask out the loss calculation for specified tokens. Set the label for tokens you don’t want to include in the loss calculation to -100
(see here for why). Note that unlike padding tokens, you still set their corresponding attention_mask
to 1, so that the model can properly attend to these tokens during prediction.
.jsonl
or .parquet
file, use our CLI to verify that it’s correct:
is_check_passed: true
in the response.
You’re now ready to upload your data to Together!
together >= 1.2.3
) Call create
with your file ID as the training_file
to kick off a new fine-tuning job. Pass --lora
for LoRA fine-tuning:
--lora-r
, --lora-dropout
, --lora-alpha
, --lora-trainable-modules
to customize your job. See the full list of hyperparameters and their definitions here.
The response object will have all the details of your job, including its ID and a status
key that starts out as “pending”:
create
with your file ID as the training_file
to kick off a new fine-tuning job:
status
key that starts out as “pending”:
--from-checkpoint
field in the request:
ft-...:{STEP_NUM}
, where {STEP_NUM}
is the step on which the checkpoint was createdtogether fine-tuning list-checkpoints {FT_JOB_ID}
.
--validation-file
and --n-evals
the number of evaluations (over the entire job):
n_evals
, the most up-to-date model weights will be evaluated with a forward pass on your validation set, and the evaluation loss will be recorded in your job event log. If you provide a W&B API key, you will also be able to see the losses in the W&B page. Therefore, the presence of the validation set will not influence the model’s training quality.
retrieve
to get the latest details about your job directly from your code:
retrieve
command from above. If your job is in a pending state for too long, please reach out to support@together.ai.
You can also monitor the fine-tuning job on the Weights & Biases platform, shown below, if you provided your API key when submitting the fine-tuning job as instructed above.
download
with your job ID:
output
as a tar.zst
file, which is an archive file format that uses the ZStandard algorithm. You’ll need to install ZStandard to decompress your model.
On Macs, you can use Homebrew:
n_epochs * n_tokens_per_training_dataset + n_evals * n_tokens_per_validation_dataset
. You can estimate fine-tuning pricing with our calculator. The exact pricing may differ from the estimate cost by ~$1 as the exact number of trainable parameter is different for each model.
Currently LoRA and full fine-tuning have the same pricing.
The tokenization step is a part of the fine-tuning process on our API, and the exact number of tokens and the price of your job will be available after the tokenization step is done. You can find the information in your jobs dashboard or retrieve them by running together fine-tuning retrieve $JOB_ID
in your CLI.
Q: Is there a minimum price? The minimum price for a fine-tuning job is $5. For example, fine-tuning Llama-3-8B with 1B training tokens for 1 epoch and 1M validation tokens for 10 evaluations is $369.7. If you fine-tune this model for 1M training tokens for 1 epoch only without a validation set, it is $0.37 based on the rate, and the final price will be $5.
Q: What happens if I cancel my job? The final price will be determined based on the amount of tokens used to train and validate your model up to the point of the cancellation. For example, if your fine-tuning job is using Llama-3-8B with a batch size of 8, and you cancelled the job after 1000 training steps the total number of tokens used for training is 8192 [context length] x 8 [batch size] x 1000 [steps] = 65,536,000. If your validation set has 1M tokens and it’s run 10 evaluation steps before the cancellation, you will need to add 10M tokens to the token count. This results in $30.91 as you can check in the pricing page.