Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Create

To start a new fine-tuning job:
tg fine-tuning create --training-file [FILE_ID | PATH] --model [MODEL]

# Shorthand
tg ft -c --training-file [FILE_ID | PATH] --model [MODEL]

Parameters

FlagDescription
--model [string]required
Specify the base model to fine-tune (See the model page).
--training-file [string | Path]required
Specify the training file to use. This may either be a previously uploaded file (See Files) or a path to a local file on disk.
The maximum allowed file size is 25GB.
--suffix [string]Up to 40 characters that will be added to your fine-tuned model name.
It is recommended to add this to differentiate fine-tuned models.
--validation-file [string]File to be used for validation. This may either be a previously uploaded file (See Files) or a path to a local file on disk.
The maximum allowed file size is 25GB.
--packing/--no-packingWhether to use sequence packing for training.
--max-seq-length [integer]Required when --no-packing.

Maximum sequence length to be used for training.
If not specified, the maximum allowed for the model and training type will be used.
--n-epochs/-ne [integer]Number of epochs to fine-tune on the dataset.
Default: 1, Min: 1, Max: 20.
--n-evals [integer]Number of evaluations to be run on a given validation set during training. Default: 0, Min: 0, Max: 100.
--n-checkpoints/-c [integer]The number of checkpoints to save during training. Default: 1 One checkpoint is always saved on the last epoch for the trained model. The number of checkpoints must be larger than 0, and equal to or less than the number of epochs (1 <= n-checkpoints <= n-epochs). If a larger number is given, the number of epochs will be used for the number of checkpoints.
--batch-size/-b [integer]The batch size to use for each training iteration. The batch size is the number of training samples/examples used in a batch. See the model page for min and max batch sizes for each model. By default --batch-size max is used by default when not specified.
--learning-rate/-lr [float]The learning rate multiplier to use for training. Default: 0.00001, Min: 0.00000001, Max: 0.01
--lr-scheduler-type [string]The learning rate scheduler type. One of "linear" or "cosine". Default: "cosine".
--min-lr-ratio [float]The ratio of the final learning rate to the peak learning rate. Default: 0.0, Min: 0.0, Max: 1.0.
--scheduler-num-cycles [float]The number or fraction of cycles for the cosine learning rate scheduler. Must be non-negative. Default: 0.5
--warmup-ratio [float]The percent of steps at the start of training to linearly increase the learning rate. Default 0.0, Min: 0.0, Max: 1.0
--max-grad-norm [float]Max gradient norm to be used for gradient clipping. Set to 0 to disable. Default: 1.0, Min: 0.0
--weight-decay [float]Weight Decay parameter for the optimizer. Default: 0.0, Min: 0.0.
--wandb-api-key [string]Your own Weights & Biases API key. If you provide the key, you can monitor your job progress on your Weights & Biases page. If not set WANDB_API_KEY environment variable is used.
--wandb-base-url [string]The base URL of a dedicated Weights & Biases instance. Leave empty if not using your own Weights & Biases instance.
--wandb-project-name [string]The Weights & Biases project for your run. If not specified, will use together as the project name.
--wandb-name [string]The Weights & Biases name for your run.
--train-on-inputs [true, false, auto]Whether to mask the user messages in conversational data or prompts in instruction data.

The auto mode will automatically determine whether to mask the inputs based on the data format.
  • For datasets with the "text" field (general format), inputs will not be masked.
  • For datasets with the "messages" field (conversational format) or "prompt" and "completion" fields (Instruction format), inputs will be masked.

Defaults to “auto”.
--train-vision/--no-train-visionEnable vision encoder parameters update.
Default is false.

Only available for Vision-Language models.
--from-checkpoint [string]The checkpoint identifier to continue training from a previous fine-tuning job.

The format: {$JOB_ID/$OUTPUT_MODEL_NAME}:{$STEP}.

The step value is optional, without it the final checkpoint will be used.
--from-hf-model [string]The Hugging Face Hub repository to start training from.
Should be as close as possible to the base model (specified by the model argument) in terms of architecture and size.
If --lora is set and --lora-trainable-modules is set to all-linear, the following modules will be set as targets for adapter training: k_proj, o_proj, q_proj, v_proj.
--hf-model-revision [string]The revision of the Hugging Face Hub model to continue training from.
--hf-api-token [string]Hugging Face API token for uploading the output model to a repository on the Hub or using a model from the Hub as initialization.
--hf-output-repo-name [string]HF repository to upload the fine-tuned model to.
(LoRA arguments are supported with together >= 1.2.3)
FlagDescription
--lora/--no-loraWhether to use LoRA adapters for fine-tuning. Use --no-lora for full fine-tuning. Default: true.
--lora-r [intger]Rank for LoRA adapter weights. Default: 8, Min: 1, Max: 64.
--lora-alpha [integer]The alpha value for LoRA adapter training. Default: 8. Min: 1. If a value less than 1 is given, it will default to --lora-r value to follow the recommendation of 1:1 scaling.
--lora-dropout [float]The dropout probability for Lora layers. Default: 0.0, Min: 0.0, Max: 1.0.
--lora-trainable-modules [string]A list of LoRA trainable modules, separated by a comma. Default: all-linear (using all supported trainable modules). Trainable modules for supported model architectures can be found here: supported modules for lora training.
(DPO arguments)
FlagDescription
--training-method [string]Training method to use. Options: sft (supervised fine-tuning), dpo (Direct Preference Optimization). Default: sft.
--dpo-beta [float]Beta parameter for DPO training. Only used when --training-method is dpo.
--dpo-normalize-logratios-by-lengthWhether to normalize logratios by sample length. Only used when --training-method is dpo. Default: false.
--rpo-alpha [float]RPO alpha parameter of DPO training to include NLL in the loss. Only used when --training-method is dpo.
--simpo-gamma [float]SimPO gamma parameter. Only used when --training-method is dpo.
The id field in the JSON response contains the value for the fine-tune job ID (ft-id) that can be used to get the status, retrieve logs, cancel the job, and download weights.

List

To list past and running fine-tune jobs:
tg fine-tuning list

# Shorthand
tg ft ls
The jobs will be sorted oldest-to-newest with the newest jobs at the bottom of the list.

Retrieve

To retrieve metadata for a job, including its current status:
tg fine-tuning retrieve [FT_ID]

Monitor Events

To list events of a past or running job:
tg fine-tuning list-events [FT_ID]

Cancel

To cancel a running job:
tg fine-tuning cancel [FT_ID]

Checkpoints

To list saved checkpoints of a job:
tg fine-tuning list-checkpoints [FT_ID]

Download Model and Checkpoint Weights

To download the weights of a fine-tuned model, run:
# Download the model to the current working directory.
tg fine-tuning download [FT_ID]
This command will download ZSTD compressed weights of the model. To extract the weights, run tar -xf filename.

Parameters

FlagDescription
--output_dir/-o [Path]Specify the output directory.
--checkpoint-step/-s [integer]Download a specific checkpoint’s weights. Defaults to download the latest weights.
--checkpoint-type [default, merged, adapter]Download the checkpoint type. The merged and adapter options only work for LoRA jobs.
Default: default

Delete

To delete a fine-tuning job:
tg fine-tuning delete [FT_ID]

# Shorthand
tg ft -d [FT_ID]

Parameters

FlagDescription
--forceBypass confirmation prompt.