Evals

Create

Create a new model evaluation job. For the full list of supported models, see Supported Models.

tg evals create

Parameters

Flag	Description
`--type [classify\|score\|compare]`	Type of evaluation to create. required
`--judge-model [string]`	Name or URL of the judge model to use for evaluation. required
`--judge-model-source [serverless\|dedicated\|external]`	Source of the judge model. required
`--judge-system-template [string]`	System template for the judge model. required
`--input-data-file-path [string]`	Path to the input data file. required
`--judge-external-api-token [string]`	API token for an external judge model. Pass an empty string (`""`) when `--judge-model-source` is `serverless` or `dedicated`. required
`--judge-external-base-url [string]`	Base URL for an external judge model. Pass an empty string (`""`) when `--judge-model-source` is `serverless` or `dedicated`. required
`--model-field [string]`	Name of the field in the input file containing text generated by the model. Mutually exclusive with `--model-to-evaluate` and the other detailed-config flags below.
`--model-to-evaluate [string]`	Model name when using the detailed config.
`--model-to-evaluate-source [serverless\|dedicated\|external]`	Source of the model to evaluate.
`--model-to-evaluate-external-api-token [string]`	Optional external API token for the model to evaluate.
`--model-to-evaluate-external-base-url [string]`	Optional external base URL for the model to evaluate.
`--model-to-evaluate-max-tokens [integer]`	Max tokens for the model to evaluate.
`--model-to-evaluate-temperature [float]`	Temperature for the model to evaluate.
`--model-to-evaluate-system-template [string]`	System template for the model to evaluate.
`--model-to-evaluate-input-template [string]`	Input template for the model to evaluate.
`--labels [string]`	Comma-separated list of classification labels.
`--pass-labels [string]`	Comma-separated list of labels considered as passing. Required for the `classify` type.
`--min-score [float]`	Minimum score value. Required for the `score` type.
`--max-score [float]`	Maximum score value. Required for the `score` type.
`--pass-threshold [float]`	Threshold score for passing. Required for the `score` type.
`--model-a-field [string]`	Name of the field in the input file containing text generated by model A. Mutually exclusive with `--model-a` and the other model-A flags below.
`--model-a [string]`	Model name or URL for model A when using the detailed config.
`--model-a-source [serverless\|dedicated\|external]`	Source of model A.
`--model-a-external-api-token [string]`	Optional external API token for model A.
`--model-a-external-base-url [string]`	Optional external base URL for model A.
`--model-a-max-tokens [integer]`	Max tokens for model A.
`--model-a-temperature [float]`	Temperature for model A.
`--model-a-system-template [string]`	System template for model A.
`--model-a-input-template [string]`	Input template for model A.
`--model-b-field [string]`	Name of the field in the input file containing text generated by model B. Mutually exclusive with `--model-b` and the other model-B flags below.
`--model-b [string]`	Model name or URL for model B when using the detailed config.
`--model-b-source [serverless\|dedicated\|external]`	Source of model B.
`--model-b-external-api-token [string]`	Optional external API token for model B.
`--model-b-external-base-url [string]`	Optional external base URL for model B.
`--model-b-max-tokens [integer]`	Max tokens for model B.
`--model-b-temperature [float]`	Temperature for model B.
`--model-b-system-template [string]`	System template for model B.
`--model-b-input-template [string]`	Input template for model B.
`--disable-position-bias-correction`	Skip the flipped-order judge pass and run only a single judge pass (original order). Halves judge cost and latency at the expense of position-bias correction. Default: off (two-pass mode).

List

List all eval jobs.

tg evals list

Parameters

Flag	Description
`--status [pending\|queued\|running\|completed\|error\|user_error]`	Filter by job status.
`--limit [integer]`	Limit number of results (max 100).
`--after [string]`	Pagination cursor.

Retrieve

Get the details for a specific evaluation job.

tg evals retrieve [EVALUATION_ID]

Status

Get the status and results of a specific evaluation job.

tg evals status [EVALUATION_ID]

TOGETHER CLI

COMMANDS

Create

Parameters

List

Parameters

Retrieve

Status

​Create

​Parameters

​List

​Parameters

​Retrieve

​Status

Create

Parameters

List

Parameters

Retrieve

Status