Documentation Index
Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Evaluations API — create and run evals
Create evaluation definitions, retrieve them with test cases and runs, and start new runs against dev, staging, or production environments.The Evaluations API is the core of Evalgate’s quality loop. Use it to define what you want to measure, add test cases, and trigger runs that compute a pass/fail score you can gate on in CI or track over time in the dashboard.
List evaluations
Returns all evaluations for the authenticated organization. Use thestatus query parameter to filter by lifecycle stage.
Query parameters
Filter by evaluation status. Accepted values:
draft, active, archived.Maximum number of results to return. Defaults to 50, maximum 100.
Number of results to skip for pagination. Defaults to 0.
Response
Create an evaluation
Creates a new evaluation definition. The evaluation starts indraft status.
Request body
Display name for the evaluation.
Evaluation type. One of
unit_test, human_eval, model_eval, ab_test.Optional description explaining the purpose of this evaluation.
Optional settings controlling how the evaluation is executed (parallelism, timeout, etc.).
Optional model configuration applied when the evaluation runner makes LLM calls.
Optional array of custom metric definitions to compute alongside built-in scoring.
Optional inline test cases to attach at creation time.
Response
Returns the created evaluation object:Get a single evaluation
Retrieve one evaluation by its numeric ID. The response includes the evaluation’stestCases and recent runs arrays, which the list endpoint omits.
Query parameters
Numeric ID of the evaluation to retrieve.
Response
Test cases attached to this evaluation.
Recent evaluation runs. Ordered by creation time, most recent first.
Start an evaluation run
Triggers a new run for an existing evaluation. Pass anenvironment value to tag the run for filtering in the dashboard and in CI comparisons.
Path parameters
Numeric ID of the evaluation to run.
Request body
Target environment for this run. Accepted values:
dev, staging, prod. You can also pass the environment via the x-evalai-env request header instead of the body.Response
Unique ID of the new run.
ID of the parent evaluation.
Initial status —
running. Poll or use webhooks to detect completion.The environment value this run was tagged with.
ISO 8601 timestamp when the run was created.
To import results from your own test runner instead of triggering a managed run, use
POST /api/evaluations/{id}/runs/import with an optional Idempotency-Key header to prevent duplicate runs on CI retry.