In the table below, models marked as “Turbo” are quantized to FP8 and those marked as “Lite” are INT4. All our other models are at full precision (FP16).If you’re not sure which chat model to use, we currently recommend Llama 3.3 70B Turbo (
meta-llama/Llama-3.3-70B-Instruct-Turbo
) to get started.
Organization | Model Name | API Model String | Context length | Quantization |
---|---|---|---|---|
Moonshot | Kimi K2 Instruct | moonshotai/Kimi-K2-Instruct | 128000 | FP8 |
Z.ai | GLM 4.5 Air | zai-org/GLM-4.5-Air-FP8 | 131072 | FP8 |
Qwen | Qwen3 235B-A22B Thinking 2507 | Qwen/Qwen3-235B-A22B-Thinking-2507 | 262144 | FP8 |
Qwen | Qwen3-Coder 480B-A35B Instruct | Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8 | 256000 | FP8 |
Qwen | Qwen3 235B-A22B Instruct 2507 | Qwen/Qwen3-235B-A22B-Instruct-2507-tput | 262144 | FP8 |
DeepSeek | DeepSeek-R1-0528 | deepseek-ai/DeepSeek-R1 | 163839 | FP8 |
DeepSeek | DeepSeek-R1-0528 Throughput | deepseek-ai/DeepSeek-R1-0528-tput | 163839 | FP8 |
DeepSeek | DeepSeek-V3-0324 | deepseek-ai/DeepSeek-V3 | 163839 | FP8 |
Meta | Llama 4 Maverick (17Bx128E) | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 1048576 | FP8 |
Meta | Llama 4 Scout (17Bx16E) | meta-llama/Llama-4-Scout-17B-16E-Instruct | 1048576 | FP16 |
Meta | Llama 3.3 70B Instruct Turbo | meta-llama/Llama-3.3-70B-Instruct-Turbo | 131072 | FP8 |
Deep Cogito | Cogito v2 Preview 70B | deepcogito/cogito-v2-preview-llama-70B | 32768 | BF16 |
Deep Cogito | Cogito v2 Preview 109B MoE | deepcogito/cogito-v2-preview-llama-109B-MoE | 32768 | BF16 |
Deep Cogito | Cogito v2 Preview 405B | deepcogito/cogito-v2-preview-llama-405B | 32768 | BF16 |
Deep Cogito | Cogito v2 Preview 671B MoE | deepcogito/cogito-v2-preview-deepseek-671b | 32768 | FP8 |
Perplexity AI | Perplexity AI R1-1776 | perplexity-ai/r1-1776 | 163840 | FP16 |
Mistral AI | Magistral Small 2506 API | mistralai/Magistral-Small-2506 | 40960 | BF16 |
DeepSeek | DeepSeek R1 Distill Llama 70B | deepseek-ai/DeepSeek-R1-Distill-Llama-70B | 131072 | FP16 |
DeepSeek | DeepSeek R1 Distill Qwen 1.5B | deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B* | 131072 | FP16 |
DeepSeek | DeepSeek R1 Distill Qwen 14B | deepseek-ai/DeepSeek-R1-Distill-Qwen-14B | 131072 | FP16 |
Marin Community | Marin 8B Instruct | marin-community/marin-8b-instruct | 4096 | FP16 |
Mistral AI | Mistral Small 3 Instruct (24B) | mistralai/Mistral-Small-24B-Instruct-2501 | 32768 | FP16 |
Meta | Llama 3.1 8B Instruct Turbo | meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo | 131072 | FP8 |
Meta | Llama 3.3 70B Instruct Turbo (Free)** | meta-llama/Llama-3.3-70B-Instruct-Turbo-Free | 131072 | FP8 |
Nvidia | Llama 3.1 Nemotron 70B | nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | 32768 | FP16 |
Qwen | Qwen 2.5 7B Instruct Turbo | Qwen/Qwen2.5-7B-Instruct-Turbo | 32768 | FP8 |
Qwen | Qwen 2.5 72B Instruct Turbo | Qwen/Qwen2.5-72B-Instruct-Turbo | 32768 | FP8 |
Qwen | Qwen2.5 Vision Language 72B Instruct | Qwen/Qwen2.5-VL-72B-Instruct | 32768 | FP8 |
Qwen | Qwen 2.5 Coder 32B Instruct | Qwen/Qwen2.5-Coder-32B-Instruct | 32768 | FP16 |
Qwen | QwQ-32B | Qwen/QwQ-32B | 32768 | FP16 |
Qwen | Qwen 2 Instruct (72B) | Qwen/Qwen2-72B-Instruct | 32768 | FP16 |
Qwen | Qwen2 VL 72B Instruct | Qwen/Qwen2-VL-72B-Instruct | 32768 | FP16 |
Qwen | Qwen3 235B A22B Throughput | Qwen/Qwen3-235B-A22B-fp8-tput | 40960 | FP8 |
Arcee | Arcee AI Virtuoso Medium | arcee-ai/virtuoso-medium-v2 | 128000 | - |
Arcee | Arcee AI Coder-Large | arcee-ai/coder-large | 32768 | - |
Arcee | Arcee AI Virtuoso-Large | arcee-ai/virtuoso-large | 128000 | - |
Arcee | Arcee AI Maestro | arcee-ai/maestro-reasoning | 128000 | - |
Arcee | Arcee AI Caller | arcee-ai/caller | 32768 | - |
Arcee | Arcee AI Blitz | arcee-ai/arcee-blitz | 32768 | - |
Meta | Llama 3.1 405B Instruct Turbo | meta-llama/Meta-Llama-3.1-405B-Instruct-Turbo | 130815 | FP8 |
Meta | Llama 3.2 3B Instruct Turbo | meta-llama/Llama-3.2-3B-Instruct-Turbo | 131072 | FP16 |
Meta | Llama 3 8B Instruct Lite | meta-llama/Meta-Llama-3-8B-Instruct-Lite | 8192 | INT4 |
Meta | Llama 3 8B Instruct Reference | meta-llama/Llama-3-8b-chat-hf* | 8192 | FP16 |
Meta | Llama 3 70B Instruct Reference | meta-llama/Llama-3-70b-chat-hf | 8192 | FP16 |
Gemma 2 27B | google/gemma-2-27b-it | 8192 | FP16 | |
Gemma Instruct (2B) | google/gemma-2b-it* | 8192 | FP16 | |
Gemma 3N E4B Instruct | google/gemma-3n-E4B-it | 32768 | FP8 | |
Gryphe | MythoMax-L2 (13B) | Gryphe/MythoMax-L2-13b* | 4096 | FP16 |
Mistral AI | Mistral (7B) Instruct | mistralai/Mistral-7B-Instruct-v0.1 | 8192 | FP16 |
Mistral AI | Mistral (7B) Instruct v0.2 | mistralai/Mistral-7B-Instruct-v0.2 | 32768 | FP16 |
Mistral AI | Mistral (7B) Instruct v0.3 | mistralai/Mistral-7B-Instruct-v0.3 | 32768 | FP16 |
NousResearch | Nous Hermes 2 - Mixtral 8x7B-DPO (46.7B) | NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO* | 32768 | FP16 |
Organization | Model Name | Model String for API | Default steps |
---|---|---|---|
Black Forest Labs | Flux.1 [schnell] (free)* | black-forest-labs/FLUX.1-schnell-Free | N/A |
Black Forest Labs | Flux.1 [schnell] (Turbo) | black-forest-labs/FLUX.1-schnell | 4 |
Black Forest Labs | Flux.1 Dev | black-forest-labs/FLUX.1-dev | 28 |
Black Forest Labs | Flux.1 Canny | black-forest-labs/FLUX.1-canny* | 28 |
Black Forest Labs | Flux.1 Depth | black-forest-labs/FLUX.1-depth* | 28 |
Black Forest Labs | Flux.1 Redux | black-forest-labs/FLUX.1-redux* | 28 |
Black Forest Labs | Flux1.1 [pro] | black-forest-labs/FLUX.1.1-pro | - |
Black Forest Labs | Flux.1 [pro] | black-forest-labs/FLUX.1-pro | 28 |
Black Forest Labs | Flux .1 Kontext [pro] | black-forest-labs/FLUX.1-kontext-pro | 28 |
Black Forest Labs | Flux .1 Kontext [max] | black-forest-labs/FLUX.1-kontext-max | 28 |
Black Forest Labs | Flux .1 Kontext [dev] | black-forest-labs/FLUX.1-kontext-dev | 28 |
Black Forest Labs | FLUX .1 Krea [dev] | black-forest-labs/FLUX.1-krea-dev | 28 |
black-forest-labs/FLUX.1-schnell
Image Model Examples
meta-llama/Llama-4-Scout-17B-16E-Instruct
) to get started. For model specific rate limits, navigate here.
Organization | Model Name | API Model String | Context length |
---|---|---|---|
Meta | Llama 4 Maverick (17Bx128E) | meta-llama/Llama-4-Maverick-17B-128E-Instruct-FP8 | 524288 |
Meta | Llama 4 Scout (17Bx16E) | meta-llama/Llama-4-Scout-17B-16E-Instruct | 327680 |
Meta | (Free) Llama 3.2 11B Vision Instruct Turbo* | meta-llama/Llama-Vision-Free | 131072 |
Meta | Llama 3.2 11B Vision Instruct Turbo | meta-llama/Llama-3.2-11B-Vision-Instruct-Turbo* | 131072 |
Meta | Llama 3.2 90B Vision Instruct Turbo | meta-llama/Llama-3.2-90B-Vision-Instruct-Turbo* | 131072 |
Qwen | Qwen2 Vision Language 72B Instruct | Qwen/Qwen2-VL-72B-Instruct | 32768 |
Qwen | Qwen2.5 Vision Language 72B Instruct | Qwen/Qwen2.5-VL-72B-Instruct | 32768 |
Arcee | Arcee AI Spotlight | arcee_ai/arcee-spotlight | 128000 |
Llama-3.2-11B-Vision-Instruct-Turbo
Vision Model Examples
Organization | Model Name | Model String for API |
---|---|---|
Cartesia | Cartesia Sonic 2 | cartesia/sonic-2 |
Cartesia | Cartesia Sonic | cartesia/sonic |
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
Qwen | Qwen 2.5 Coder 32B Instruct | Qwen/Qwen2.5-Coder-32B-Instruct | 32768 |
Model Name | Model String for API | Model Size | Embedding Dimension | Context Window |
---|---|---|---|---|
M2-BERT-80M-32K-Retrieval | togethercomputer/m2-bert-80M-32k-retrieval | 80M | 768 | 32768 |
BGE-Large-EN-v1.5 | BAAI/bge-large-en-v1.5 | 326M | 1024 | 512 |
BGE-Base-EN-v1.5 | BAAI/bge-base-en-v1.5 | 102M | 768 | 512 |
GTE-Modernbert-base | Alibaba-NLP/gte-modernbert-base | 149M | 768 | 8192 |
Multilingual-e5-large-instruct | intfloat/multilingual-e5-large-instruct | 560M | 1024 | 514 |
Organization | Model Name | Model Size | Model String for API | Max Doc Size (tokens) | Max Docs |
---|---|---|---|---|---|
Salesforce | LlamaRank | 8B | Salesforce/Llama-Rank-v1 | 8192 | 1024 |
MixedBread | Rerank Large | 1.6B | mixedbread-ai/Mxbai-Rerank-Large-V2 | 32768 | - |
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
Meta | LLaMA-2 (70B) | meta-llama/Llama-2-70b-hf | 4096 |
mistralai | Mixtral-8x7B (46.7B) | mistralai/Mixtral-8x7B-v0.1 | 32768 |
"safety_model": "MODEL_API_STRING"
Organization | Model Name | Model String for API | Context length |
---|---|---|---|
Meta | Llama Guard (8B) | meta-llama/Meta-Llama-Guard-3-8B | 8192 |
Meta | Llama Guard 4 (12B) | meta-llama/Llama-Guard-4-12B | 1048576 |
Virtue AI | Virtue Guard | VirtueAI/VirtueGuard-Text-Lite | 32768 |