Skip to main content
Using a coding agent? Install the together-fine-tuning skill so your agent writes correct fine-tuning code automatically. See Coding agent setup for the install flow.
This quickstart walks through a full fine-tuning lifecycle. You’ll prepare a conversational dataset (CoQA), upload it, launch a LoRA job on Qwen3 8B, watch it complete, deploy the result, and compare it to the base model. End-to-end runtime is roughly 20 to 40 minutes for the example dataset. You can find a runnable notebook for this tutorial on GitHub.

Prerequisites

Before you begin, make sure you have:
pip install -U together datasets transformers tqdm
Make sure to export your API key before you begin:
export TOGETHER_API_KEY=<your_key>

Step 1: Prepare your dataset

This quickstart uses the CoQA conversational dataset. Together AI supports four text data formats: conversational, instruction, preference, and generic text. JSONL is the default file format, but you can use Parquet for pre-tokenized data and custom loss masking. Transform CoQA into the conversational shape:
Python
from datasets import load_dataset

coqa = load_dataset("stanfordnlp/coqa")

system_prompt = (
    "Read the story and extract answers for the questions.\nStory: {}"
)


def map_fields(row):
    messages = [
        {"role": "system", "content": system_prompt.format(row["story"])}
    ]
    for q, a in zip(row["questions"], row["answers"]["input_text"]):
        messages.append({"role": "user", "content": q})
        messages.append({"role": "assistant", "content": a})
    return {"messages": messages}


train = coqa["train"].map(
    map_fields, remove_columns=coqa["train"].column_names
)
train.to_json("coqa_train.jsonl")
Validate the file client-side, then upload. check=True re-runs the same validation server-side before the job starts.
import json
from together import Together
from together.lib.utils import check_file

client = Together()

report = check_file("coqa_train.jsonl")
print(json.dumps(report, indent=2))
assert report["is_check_passed"]

train_file = client.files.upload(
    file="coqa_train.jsonl",
    purpose="fine-tune",
    check=True,
)
print(train_file.id)
Here’s a sample output from check_file:
{
  "is_check_passed": true,
  "message": "Checks passed",
  "found": true,
  "file_size": 23777505,
  "utf8": true,
  "line_type": true,
  "text_field": true,
  "key_value": true,
  "has_min_samples": true,
  "num_samples": 7199,
  "load_json": true,
  "filetype": "jsonl"
}
Save the id from the upload response. You’ll pass it as training_file in the next step.

Step 2: Launch the job

client.fine_tuning.create() starts a LoRA job by default. The example below tunes Qwen3 8B for three epochs. See the API reference for the full list of parameters.
job = client.fine_tuning.create(
    training_file=train_file.id,
    model="Qwen/Qwen3-8B",
    n_epochs=3,
    n_checkpoints=1,
    learning_rate=1e-5,
    warmup_ratio=0,
    train_on_inputs="auto",
    lora=True,
    suffix="qwen3_8b_demo",
    # wandb_api_key=os.environ.get("WANDB_API_KEY"),  # optional
)
print(job.id)
Response:
ft-d1522ffb-8f3e-4106-9774-aed81e0164a4
Save the job ID.
Here are some common job parameters:
ParameterRequiredDefaultNotes
training_fileRequiredn/aFile ID from Step 1.
modelRequiredn/aBase model to fine-tune.
loraOptionaltrueSet false for full fine-tuning.
n_epochsOptional1Passes through the training set.
learning_rateOptional0.00001Step size.
batch_sizeOptional"max"Examples per step.
warmup_ratioOptional0.0Fraction of steps for LR warmup.
weight_decayOptional0.0L2 regularization.
train_on_inputsOptional"auto"Mask user or prompt tokens from the loss.
suffixOptionaln/aUp to 64 characters appended to the output model name.
n_checkpointsOptional1Intermediate checkpoints saved during training.
n_evalsOptional0Evaluations against validation_file during training.
hf_api_tokenOptionaln/aOnly required for a private Hugging Face base. Omit otherwise.
Each fine_tuning.create() call starts a new billed job. If you get a retryable error, run client.fine_tuning.list() first to make sure you aren’t launching a duplicate.

Step 3: Watch the job complete

Jobs move through these states: pending → queued → running → uploading → completed. Queue wait time is typically under an hour. Once running, multiply the first epoch’s duration by n_epochs to estimate the time remaining. Poll for completion (or error/cancellation), then read the output model name:
import time

job_id = job.id
deadline = time.time() + 6 * 60 * 60  # safety cap: 6 hours

while True:
    status = client.fine_tuning.retrieve(id=job_id)
    print(status.status)
    if status.status in ("completed", "error", "cancelled"):
        break
    if time.time() > deadline:
        raise TimeoutError(f"Job still {status.status} after 6 hours")
    time.sleep(60)

if status.status != "completed":
    raise RuntimeError(f"Job ended with status: {status.status}")

output_model = status.x_model_output_name
print(output_model)
Here’s a sample event log:
Fine tune request created
Job started at 2026-04-03T03:19:46Z
Model data downloaded at 2026-04-03T03:19:48Z
WandB run initialized.
Training started for Qwen/Qwen3-8B
Epoch completed, at step 24
Epoch completed, at step 48
Epoch completed, at step 72
Training completed for Qwen/Qwen3-8B at 2026-04-03T03:27:55Z
Uploading output model
Model upload complete
Job finished at 2026-04-03T03:31:33Z
You can also monitor the run on the fine-tuning jobs dashboard. For per-step loss curves, see training metrics.

Step 4: Deploy and call your model

Fine-tuned models can be run on Together AI using dedicated endpoints. The example below deploys, sends one request, and tears the endpoint down to stop billing:
# 1. Preflight: confirm the base can host a fine-tune
client.endpoints.list_hardware(model=status.model)

# 2. Create the endpoint. Use a hardware id returned by list_hardware
# above; for Qwen3 8B the platform currently serves 1x H100 80GB SXM.
endpoint = client.endpoints.create(
    display_name="Qwen3 8B fine-tune",
    model=output_model,
    hardware="1x_nvidia_h100_80gb_sxm",
    autoscaling={"min_replicas": 1, "max_replicas": 1},
)

# 3. Wait until ready
deadline = time.time() + 20 * 60
while True:
    ep = client.endpoints.retrieve(endpoint.id)
    if ep.state == "STARTED":
        break
    if ep.state in ("FAILED", "STOPPED"):
        raise RuntimeError(f"Endpoint state: {ep.state}")
    if time.time() > deadline:
        raise TimeoutError(f"Endpoint still {ep.state} after 20 minutes")
    time.sleep(30)

# 4. Send a request
response = client.chat.completions.create(
    model=endpoint.name,
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    max_tokens=128,
)
print(response.choices[0].message.content)

# 5. Delete when done
client.endpoints.delete(endpoint.id)
Pass endpoint.name (not output_model) as the model parameter when calling inference APIs. The endpoint name includes a unique suffix that routes traffic to your deployment.
Congrats! You just fine-tuned a model, deployed it to a dedicated endpoint, and ran inference end-to-end.

Step 5: Compare against the base model (optional)

To measure the impact of fine-tuning, run the same prompts through the base model and the fine-tuned model.
Many fine-tuneable base models aren’t available on serverless. For example, calling Qwen/Qwen3-8B directly returns Unable to access non-serverless model. To compare, deploy the base on its own dedicated endpoint, evaluate against endpoint.name, then tear that endpoint down too. Serverless bases (those with a per-token price listed on the models dashboard) can be called directly without deploying anything.
This GitHub notebook runs an Exact Match and F1 comparison on the CoQA validation split. Here’s a sample result from one run:
Qwen3 8BEMF1
Base0.010.18
Fine-tuned0.320.41

Stop the endpoint

Dedicated endpoints bill per minute as long as they’re running. Step 4 deletes the endpoint at the end of the script, but if you skipped that step or want to delete it later, run:
tg endpoints delete "<ENDPOINT_ID>"
Find the endpoint ID by running tg endpoints list.

Continue from a checkpoint

Resume training from an existing job by passing from_checkpoint:
job = client.fine_tuning.create(
    training_file="<NEW_FILE_ID>",
    from_checkpoint="<PREVIOUS_JOB_ID>",
)
from_checkpoint accepts the output model name, the job ID, or a specific step in the form ft-...:{STEP_NUM}. List available checkpoints with tg fine-tuning list-checkpoints <JOB_ID>.

Next steps

Data preparation

See the full schema for conversational, instruction, preference, and tokenized data.

Supported models

Browse base models with context lengths and batch size limits.

Preference tuning

Align a model with paired preferred and dispreferred responses.

Deploy your model

Hosting, teardown, and local inference for fine-tuned models.