How long will it take for my job to start?
It depends. Factors that affect waiting time include the number of number of pending jobs from other customers, the number of jobs currently running, and available hardware. If there are no other pending jobs and there is available hardware, your job should start within a minute of submission. Typically jobs will start within an hour of submission. However, there is no guarantee on waiting time.How long will my job take to run?
It depends. Factors that impact your job run time are model size, training data size, and network conditions when downloading/uploading model/training files. You can estimate how long your job will take to complete training by multiplying the number of epochs by the time to complete the first epoch.How can I estimate my fine-tuning job cost?
To estimate the cost of your fine-tuning job:- Calculate approximate training tokens: context_length × batch_size × steps × epochs
- Add validation tokens: validation_dataset_size × evaluation_frequency
- Multiply the total tokens by the per-token rate for your chosen model size, fine-tuning type, and implementation method
Fine-Tuning Pricing
Fine-tuning pricing is based on the total number of tokens processed during your job, including training and validation. Cost varies by model size, fine-tuning type (Supervised Fine-tuning or DPO), and implementation method (LoRA or Full Fine-tuning). The total cost is calculated as:total_tokens_processed × per_token_rate
Where total_tokens_processed = (n_epochs × n_tokens_per_training_dataset) + (n_evals × n_tokens_per_validation_dataset)
For current rates, refer to our fine-tuning pricing page.
The exact token count and final price are available after tokenization completes, shown in your jobs dashboard or via together fine-tuning retrieve $JOB_ID.
Dedicated Endpoint Charges for Fine-Tuned Models
After fine-tuning, hosting charges apply for dedicated endpoints (per minute, even when not in use). These are separate from job costs and continue until you stop the endpoint. To avoid unexpected charges:- Monitor active endpoints in the models dashboard
- Stop unused endpoints
- Review hosting rates on the pricing page
Why am I getting an error when uploading a training file?
There are two common issues you may encounter, Your API key may be incorrect. If you get a 403 status code, this indicates your API Key is incorrect. Your balance may be less than the job minimum. We verify that you have sufficient balance on your account that is equal to the minimum job charge ($5). If you do not have sufficient balance, you can increase your account limit by adding a credit card to your account, adjusting your spending limit if you ready have a credit card, or paying your outstanding account balance. If you have sufficient balance on your account, contact support for assistance.Why was my job cancelled?
There are two reasons that a job may be automatically cancelled. You do not have sufficient balance on your account to cover the cost of the job. You have entered an incorrect WandB API key You can determine why your job was cancelled by: (1) checking the events list for your job via the together-CLI tool$ together list-events <job-fine-tune-id>
(2) Via the web interface https://api.together.ai > Jobs > Cancelled Job > Events List


What should I do if my job is cancelled due to billing limits?
You can an add a credit card to your account to increase your spending limit. If you already have a credit card on your account, you can make a payment or adjust your spending limit. Contact support if you need assistance with your account balance.Understanding Refunds When Canceling Fine-Tuning Jobs
When you cancel a running fine-tuning job, you may notice that you don’t receive a full refund of the original amount charged. This is because our fine-tuning billing system works based on the actual steps completed.How Fine-Tuning Refunds Work
When you cancel a fine-tuning job:- You are charged for the steps that have already been completed, accounting for the hardware resources used during that time
- You receive a refund only for the steps that have not yet been completed
Checking Your Job Progress
To understand how many steps have been completed, you can use the following API call:client.fine_tuning.retrieve("your-job-id").total_steps
Replace "your-job-id" with your actual fine-tuning job ID. This will show you how far the job has progressed.
If you have questions about a specific fine-tuning job’s billing, please contact support with your job ID for a detailed breakdown.
Why was there an error while running my job?
If your job fails after downloading the training file, but before training starts, the most likely source of the error is the training data. For example, your event log might look like
$ together files check ~/Downloads/unified_joke_explanations.jsonl { "is_check_passed": true, "model_special_tokens": "we are not yet checking end of sentence tokens for this model", "file_present": "File found", "file_size": "File size 0.0 GB", "num_samples": 356 }
Despite our best efforts, the file checker does not catch all errors. Please contact support if your training data file passes the checks, but you are still seeing the above error conditions.
If you see an error during other steps in your training job, this may be due to internal errors in our training stack (e.g. hardware failure or bugs). We actively monitor job failures, and work as quickly as we can to resolve these issues. Once the issue has been resolved by our engineers, your job will be automatically or manually restarted. Charges for the restarted job will be refunded.
How do I know if my job was restarted?
A job will be automatically or manually restarted if the job fails to complete due to an internal error. You can view the event log to see if the job was restarted, to determine the new fine tune ID of the restarted job, and check the refund amount (if applicable). Any charges from the failed job will be refunded when your job is restarted. An example event log for a restarted job is:
Common Error Codes During Fine-Tuning
| Code | Cause | Solution | 
|---|---|---|
| 401 | Missing or Invalid API Key | Ensure you are using the correct API Key and supplying it correctly | 
| 403 | Input token count + max_tokensparameter exceeds model context length | Set max_tokensto a lower number. For chat models, you may setmax_tokenstonull | 
| 404 | Invalid Endpoint URL or model name | Check your request is made to the correct endpoint and the model is available | 
| 429 | Rate limit exceeded | Throttle request rate (see rate limits) | 
| 500 | Invalid Request | Ensure valid JSON, correct API key, and proper prompt format for the model type | 
| 503 | Engine Overloaded | Try again after a brief wait. Contact support if persistent | 
| 504 | Timeout | Try again after a brief wait. Contact support if persistent | 
| 524 | Cloudflare Timeout | Try again after a brief wait. Contact support if persistent | 
| 529 | Server Error | Try again after a wait. Contact support if persistent | 
Can I download the weights of my model?
If you would like to download the weights of your model so that you can use your fine-tuned model outside of our platform, you are able to do this through the following: To download the weights of a fine-tuned model, run:together fine-tuning download <FT-ID>
This command will download ZSTD compressed weights of the model. To extract the weights, run tar -xf filename.
Other arguments:
--output,-o (filename, optional) — Specify the output filename. Default: <MODEL-NAME>.tar.zst
--step,-s (integer, optional) — Download a specific checkpoint’s weights. Defaults to download the latest weights. Default: -1