Skip to main content

Documentation Index

Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Evaluations API — create and run evals

Create evaluation definitions, retrieve them with test cases and runs, and start new runs against dev, staging, or production environments.
The Evaluations API is the core of Evalgate’s quality loop. Use it to define what you want to measure, add test cases, and trigger runs that compute a pass/fail score you can gate on in CI or track over time in the dashboard.

List evaluations

Returns all evaluations for the authenticated organization. Use the status query parameter to filter by lifecycle stage.
curl "https://evalgate.com/api/evaluations" \
  -H "Authorization: Bearer YOUR_API_KEY"

Query parameters

status
string
Filter by evaluation status. Accepted values: draft, active, archived.
limit
integer
Maximum number of results to return. Defaults to 50, maximum 100.
offset
integer
Number of results to skip for pagination. Defaults to 0.

Response

{
  "evaluations": [
    {
      "id": 42,
      "name": "Chatbot regression",
      "description": null,
      "type": "unit_test",
      "status": "draft",
      "organizationId": 1,
      "createdAt": "2026-03-15T10:30:00.000Z",
      "updatedAt": "2026-03-15T10:30:00.000Z"
    }
  ]
}
evaluations
array

Create an evaluation

Creates a new evaluation definition. The evaluation starts in draft status.
curl https://evalgate.com/api/evaluations \
  -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Evaluation",
    "type": "unit_test"
  }'

Request body

name
string
required
Display name for the evaluation.
type
string
required
Evaluation type. One of unit_test, human_eval, model_eval, ab_test.
description
string
Optional description explaining the purpose of this evaluation.
executionSettings
object
Optional settings controlling how the evaluation is executed (parallelism, timeout, etc.).
modelSettings
object
Optional model configuration applied when the evaluation runner makes LLM calls.
customMetrics
array
Optional array of custom metric definitions to compute alongside built-in scoring.
testCases
array
Optional inline test cases to attach at creation time.

Response

Returns the created evaluation object:
{
  "id": 43,
  "name": "My Evaluation",
  "type": "unit_test",
  "status": "draft",
  "organizationId": 1,
  "createdAt": "2026-03-15T10:35:00.000Z",
  "updatedAt": "2026-03-15T10:35:00.000Z"
}

Get a single evaluation

Retrieve one evaluation by its numeric ID. The response includes the evaluation’s testCases and recent runs arrays, which the list endpoint omits.
curl "https://evalgate.com/api/evaluations?id=42" \
  -H "Authorization: Bearer YOUR_API_KEY"

Query parameters

id
integer
required
Numeric ID of the evaluation to retrieve.

Response

{
  "id": 42,
  "name": "Chatbot regression",
  "type": "unit_test",
  "status": "active",
  "organizationId": 1,
  "createdAt": "2026-03-15T10:30:00.000Z",
  "updatedAt": "2026-03-15T10:30:00.000Z",
  "testCases": [
    {
      "id": 1,
      "name": "Case A",
      "input": "What is your refund policy?"
    }
  ],
  "runs": [
    {
      "id": 9001,
      "status": "completed",
      "evaluationId": 42
    }
  ]
}
testCases
array
Test cases attached to this evaluation.
runs
array
Recent evaluation runs. Ordered by creation time, most recent first.

Start an evaluation run

Triggers a new run for an existing evaluation. Pass an environment value to tag the run for filtering in the dashboard and in CI comparisons.
curl https://evalgate.com/api/evaluations/42/runs \
  -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"environment": "dev"}'

Path parameters

id
integer
required
Numeric ID of the evaluation to run.

Request body

environment
string
Target environment for this run. Accepted values: dev, staging, prod. You can also pass the environment via the x-evalai-env request header instead of the body.

Response

{
  "id": 9002,
  "evaluationId": 42,
  "status": "running",
  "environment": "dev",
  "createdAt": "2026-03-15T10:40:00.000Z"
}
id
integer
Unique ID of the new run.
evaluationId
integer
ID of the parent evaluation.
status
string
Initial status — running. Poll or use webhooks to detect completion.
environment
string
The environment value this run was tagged with.
createdAt
string
ISO 8601 timestamp when the run was created.
To import results from your own test runner instead of triggering a managed run, use POST /api/evaluations/{id}/runs/import with an optional Idempotency-Key header to prevent duplicate runs on CI retry.