Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.together.ai/llms.txt

Use this file to discover all available pages before exploring further.

Together AI Skills are instruction files that give AI coding agents domain-specific knowledge about the Together AI platform. When your agent detects a relevant task, it automatically loads the right skill and uses it to write correct code with proper model IDs, SDK patterns, and best practices. Together AI offers 12 skills covering the full platform. They work with Claude Code, Cursor, Codex, Gemini CLI, and any other coding agent you use.

Installation

npx skills add togethercomputer/skills

Verify installation

You should see one SKILL.md per installed skill.
ls your-project/.claude/skills/together-*/SKILL.md

Available skills

Once installed, skills activate automatically when the agent detects a relevant task. You can also call a skill explicitly with /<skill-name>, but most of the time the agent picks the right one on its own.
SkillWhat it covers
together-chat-completionsServerless chat inference, streaming, multi-turn conversations, function calling (6 patterns), structured JSON outputs, and reasoning models.
together-imagesText-to-image generation, image editing with Kontext, FLUX model selection, LoRA-based styling, and reference-image guidance.
together-videoText-to-video and image-to-video generation, keyframe control, model and dimension selection, and async job polling.
together-audioText-to-speech (REST, streaming, realtime WebSocket) and speech-to-text (transcription, translation, diarization, timestamps).
together-embeddingsDense vector generation, semantic search, RAG pipelines, and reranking with dedicated endpoints.
together-fine-tuningLoRA, full, DPO preference, VLM, function-calling, and reasoning fine-tuning, plus BYOM uploads.
together-batch-inferenceAsync batch jobs with JSONL input, polling, result downloads, and up to 50% cost savings.
together-evaluationsLLM-as-a-judge workflows: classify, score, and compare evaluations with external provider support.
together-sandboxesRemote sandboxed Python execution with session reuse, file uploads, and chart outputs.
together-dedicated-endpointsSingle-tenant GPU endpoints with hardware sizing, autoscaling, and fine-tuned model deployment.
together-dedicated-containersCustom Dockerized inference workers using the Jig CLI, Sprocket SDK, and queue API.
together-gpu-clustersOn-demand and reserved GPU clusters (H100, H200, B200) with Kubernetes, Slurm, and shared storage.

How skills are structured

Each skill is a self-contained directory:
skills/together-<product>/
├── SKILL.md           # Core instructions (loaded when the skill triggers)
├── references/        # Detailed docs: model lists, API parameters, CLI commands
└── scripts/           # Runnable Python and TypeScript examples
When a skill triggers, the agent first loads SKILL.md for high-level routing and rules. If it needs deeper detail (model tables, full API specs, or data format docs), it pulls from references/. For complete working code, it uses the scripts/ directory.

Use a single skill

Each skill works on its own for focused tasks. Describe what you want and the right skill activates, or invoke a specific skill with /<skill-name>. For example, /together-fine-tuning.

Chat with streaming and tool use

Build a multi-turn chatbot using Together AI with Kimi-K2.5
that can call a weather API and return structured JSON.
The agent uses together-chat-completions to generate correct v2 SDK code with the right model ID, streaming setup, tool definitions, and the complete tool-call loop.

Generate and edit images

Generate a product hero image with FLUX.2, then use Kontext
to change the background to a rainy cyberpunk alley.
The agent uses together-images for both the initial generation and the Kontext editing call, handling base64 decoding and file saving.

Fine-tune a model

Fine-tune Llama 3.3 70B on my support conversations using LoRA,
then deploy the result to a dedicated endpoint.
The agent uses together-fine-tuning for data preparation, upload, training configuration, and monitoring, then hands off to together-dedicated-endpoints for deployment.

Multi-step workflow examples

Skills define hand-off boundaries between products so the agent can chain them together for multi-step workflows. Here are four examples that span multiple skills.

Build a RAG pipeline with evaluation

Embed my document corpus with Together AI, build a retrieval pipeline
with reranking, then evaluate the answer quality with an LLM judge.
The agent chains three skills:
  1. together-embeddings: generates dense vectors for your documents and builds a cosine-similarity retriever with reranking.
  2. together-chat-completions: generates answers from the retrieved context using a chat model.
  3. together-evaluations: sets up a score evaluation to grade answer quality with an LLM judge, polls for results, and downloads the per-row scores.

Fine-tune, deploy, and benchmark

Fine-tune Qwen on my preference data with DPO, deploy the result,
then compare it against the base model using Together AI evaluations.
The agent chains three skills:
  1. together-fine-tuning: prepares preference pairs, runs SFT first, then DPO training, and monitors the job.
  2. together-dedicated-endpoints: deploys the fine-tuned checkpoint to a dedicated endpoint with hardware sizing and autoscaling.
  3. together-evaluations: runs a compare evaluation between the base model and your fine-tuned model, then downloads the results.

Generate product media from a single prompt

Generate a product photo with FLUX.2, edit it with Kontext to add
studio lighting, then animate the final image into a 5-second video.
The agent chains two skills:
  1. together-images: generates the initial image, then edits it with Kontext for studio lighting.
  2. together-video: takes the edited image as a first-frame keyframe, submits an image-to-video job, polls until completion, and downloads the MP4.

Batch-process and analyze results

Classify 50,000 support tickets overnight with the batch API,
then run the results through Together Sandboxes to generate
a breakdown chart by category.
The agent chains two skills:
  1. together-batch-inference: prepares the JSONL input, uploads it, creates the batch job, and polls until the results are ready.
  2. together-sandboxes: uploads the results file to a sandboxed Python session, runs pandas analysis, and generates a matplotlib chart.

SDK compatibility

All code generated by these skills targets the Together Python v2 SDK (together>=2.0.0) and the Together TypeScript SDK (together-ai). If you are upgrading from v1, see the migration guide for breaking changes in method names, argument styles, and response shapes.

Resources