CLI

In the following guide, we’ll learn how to use the Together AI fine-tuning API to fine-tune a Llama 3 7B model on a joke explanation dataset.

Quick Links

Install the CLI

To get started, install the together Python CLI:

pip install --upgrade together

Authenticate

The API Key can be configured by setting the TOGETHER_API_KEY environment variable, like this:

export TOGETHER_API_KEY=xxxxx

Find your API token in your account settings.

Upload your Data

See the data preparation instruction to understand the requirements and its readiness. To upload your data, run the following command. Replace PATH_TO_DATA_FILE with the path to your dataset.

together files upload PATH_TO_DATA_FILE

In our case, our dataset is in joke_explanations.jsonl. Here’s what the output looks like:

together files upload joke_explanations.jsonl
{
    "id": "file-c973fb61-ca71-4aa2-94e2-a2dde5cc40f4",
    "object": "file",
    "created_at": 1713468292,
    "type": null,
    "purpose": "fine-tune",
    "filename": "unified_joke_explanations.jsonl",
    "bytes": 0,
    "line_count": 0,
    "processed": false
}

Start Fine-Tuning

We’re now ready to start the fine-tuning job! To do so, run the following command:

together fine-tuning create --training-file $FILE_ID --model $MODEL_NAME --wandb-api-key $WANDB_API_KEY

Replace FILE_ID with the ID of the training file.
Replace MODEL_NAME with the API name of the base model you want to fine-tune (refer to the models list).
Replace WANDB_API_KEY with your own Weights & Biases API key (Optional).

To see all the additional parameters you can include, visit this page. In our case, we’ll replace the FILE_ID with the ID we got when we uploaded the dataset and the MODEL_NAME with the model we want to use, which in this example, is meta-llama/Meta-Llama-3-8B. Here’s a sample output:

together fine-tuning create --training-file file-c973fb61-ca71-4aa2-94e2-a2dde5cc40f4 --model meta-llama/Meta-Llama-3-8B
{
    "id": "ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f",
    "training_file": "file-2490a204-16e2-481e-a3d5-5636a6f3a4ea",
    "model": "meta-llama/Meta-Llama-3-8B",
    "output_name": "[email protected]/Meta-Llama-3-8B-2024-04-18-19-37-52",
    "n_epochs": 1,
    "n_checkpoints": 1,
    "batch_size": 32,
    "learning_rate": 3e-05,
    "created_at": "2024-04-18T19:37:52.611Z",
    "updated_at": "2024-04-18T19:37:52.611Z",
    "status": "pending",
    "events": [
        {
            "object": "fine-tune-event",
            "created_at": "2024-04-18T19:37:52.611Z",
            "message": "Fine tune request created",
            "type": "JOB_PENDING",
            ...
        }
    ],
    "training_file_size": 150047,
    "model_output_path": "s3://together-dev/finetune/65987df6752090cead0c9056/[email protected]/Meta-Llama-3-8B-2024-04-18-19-37-52/ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f",
    "user_id": "65987df6752090cead0c9056",
    "owner_address": "0xf42ea9df7377257571fb0aae8799b6a357ba1bfb",
    "enable_checkpoints": false,
    ...
}

Take note of id as you’ll need that to track progress and download model weights. For example, from the sample output above, ft-3b883474-f39c-40d9-9d5a-7f97ba9eeb9f is your Job ID. A fine-tune job can take anywhere between a couple minutes to hours depending on the base model, dataset size, number of epochs, and job queue. Also, unless you set --quiet in the CLI, there will be a confirmation step to make sure you are aware of any defaults or arguments that needed to be reset from their original inputs for this specific finetune job.

Monitor Progress

View progress by navigating to the Jobs tab in the Playground. You can also monitor progress using the CLI:

together fine-tuning list-events FINETUNE_ID

Congratulations! You’ve just fine-tuned a model with the Together API. Now it’s time to deploy your model.

Deploy your Fine-Tuned Model

Host your Model

Once the fine-tune job completes, you will be able to see your model in the Playground Models page. You can directly deploy the model through the web UI by clicking on the model, selecting your hardware and clicking play! Available hardware includes RTX6000, L40, L40S, A100 PCIe, A100 SXM and H100. Hardware options displayed depends on model constraints and overall hardware availability. Once the model is deployed, you can use the model through the playground or through our inference API. For the inference API, follow the instructions in the inference documentation. Please note that hosting your fine-tuned model is charged per minute hosted. See the hourly pricing for fine-tuned model inference in the pricing table. When you are not using the model, be sure to stop the endpoint through the web UI. However, frequent starting and stopping may incur delay on your deployment. To directly download the weights, see the instructions here.

Pricing

Pricing for fine-tuning is based on model size, the number of tokens, and the number of epochs. You can estimate fine-tuning pricing with our calculator. The tokenization step is a part of the fine-tuning process on our API, and the exact number of tokens and the price of your job will be available after the tokenization step is done. You can find the information in the “JOBS” page or retrieve them by running together fine-tuning retrieve $JOB_ID in your CLI. Q: Is there a minimum price? The minimum price for a fine-tuning job is

5. For example, fine-tuning Llama-3-8B with 1B tokens for 1 epoch is

366. If you fine-tune this model for 1M tokens for 1 epoch, it is

0.37 based on the rate, and the final price will be

5. Q: What happens if I cancel my job? The final price will be determined baed on the amount of tokens used to train your model up to the point of the cancellation. For example, if your fine-tuning job is using Llama-3-8B with a batch size of 8, and you cancelled the job after 1000 training steps, the total number of tokens used for training is 8192 [context length] x 8 [batch size] x 1000 [steps] = 65,536,000. This results in $27.21 as you can check in the pricing page.

Other CLI Commands

Here are all the commands available through CLI for fine-tuning.

# list commands
together --help

# check which models are available.
together models list

# check your jsonl file
together files check jokes.jsonl

# upload your jsonl file
together files upload jokes.jsonl

# upload your jsonl file and disable file checking
together files upload jokes.jsonl --no-check

# list your uploaded files
together files list

# start fine-tuning a model on your jsonl file (use the id of your file given to after upload or from together files list)
together fine-tuning create -t file-9263d6b7-736f-43fc-8d14-b7f0efae9079 -m togethercomputer/RedPajama-INCITE-Chat-3B-v1

# retrieve progress updates about the finetune job
together fine-tuning retrieve ft-dd93c727-f35e-41c2-a370-7d55b54128fa

# download your finetuned model (with your fine_tune_id from the id key given during create or from together finetune list)
together fine-tuning download ft-dd93c727-f35e-41c2-a370-7d55b54128fa

# inference using your new finetuned model (with new finetuned model name from together models list)
together complete "Space robots are" -m yourname/ft-dd93c727-f35e-41c2-a370-7d55b54128fa-2023-08-16-10-15-09

Resources

See the list of base models available to fine-tune with the Together API.

Getting Started

Inference

Training

Capabilities

Other APIs

Quick Links

Install the CLI

Authenticate

Upload your Data

Start Fine-Tuning

Monitor Progress

Deploy your Fine-Tuned Model

Host your Model

Pricing

Other CLI Commands

Resources

Getting Started

Inference

Training

Capabilities

Other APIs

​Quick Links

​Install the CLI

​Authenticate

​Upload your Data

​Start Fine-Tuning

​Monitor Progress

​Deploy your Fine-Tuned Model

​Host your Model

​Pricing

​Other CLI Commands

​Resources

Quick Links

Install the CLI

Authenticate

Upload your Data

Start Fine-Tuning

Monitor Progress

Deploy your Fine-Tuned Model

Host your Model

Pricing

Other CLI Commands

Resources