Evaluations API — create and run evals

Create evaluation definitions, retrieve them with test cases and runs, and start new runs against dev, staging, or production environments.

The Evaluations API is the core of Evalgate’s quality loop. Use it to define what you want to measure, add test cases, and trigger runs that compute a pass/fail score you can gate on in CI or track over time in the dashboard.

List evaluations

Returns all evaluations for the authenticated organization. Use the status query parameter to filter by lifecycle stage.

curl "https://evalgate.com/api/evaluations" \
  -H "Authorization: Bearer YOUR_API_KEY"

Query parameters

status

string

Filter by evaluation status. Accepted values: draft, active, archived.

limit

integer

Maximum number of results to return. Defaults to 50, maximum 100.

offset

integer

Number of results to skip for pagination. Defaults to 0.

Response

{
  "evaluations": [
    {
      "id": 42,
      "name": "Chatbot regression",
      "description": null,
      "type": "unit_test",
      "status": "draft",
      "organizationId": 1,
      "createdAt": "2026-03-15T10:30:00.000Z",
      "updatedAt": "2026-03-15T10:30:00.000Z"
    }
  ]
}

evaluations

array

Show Evaluation object fields

integer

Unique numeric ID of the evaluation.

name

string

Display name.

description

string | null

Optional description.

type

string

One of unit_test, human_eval, model_eval, ab_test.

status

string

One of draft, active, archived.

organizationId

integer

ID of the owning organization.

createdAt

string

ISO 8601 creation timestamp.

updatedAt

string

ISO 8601 last-updated timestamp.

Create an evaluation

Creates a new evaluation definition. The evaluation starts in draft status.

curl https://evalgate.com/api/evaluations \
  -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "My Evaluation",
    "type": "unit_test"
  }'

Request body

name

string

required

Display name for the evaluation.

type

string

required

Evaluation type. One of unit_test, human_eval, model_eval, ab_test.

description

string

Optional description explaining the purpose of this evaluation.

executionSettings

object

Optional settings controlling how the evaluation is executed (parallelism, timeout, etc.).

modelSettings

object

Optional model configuration applied when the evaluation runner makes LLM calls.

customMetrics

array

Optional array of custom metric definitions to compute alongside built-in scoring.

testCases

array

Optional inline test cases to attach at creation time.

Response

Returns the created evaluation object:

{
  "id": 43,
  "name": "My Evaluation",
  "type": "unit_test",
  "status": "draft",
  "organizationId": 1,
  "createdAt": "2026-03-15T10:35:00.000Z",
  "updatedAt": "2026-03-15T10:35:00.000Z"
}

Get a single evaluation

Retrieve one evaluation by its numeric ID. The response includes the evaluation’s testCases and recent runs arrays, which the list endpoint omits.

curl "https://evalgate.com/api/evaluations?id=42" \
  -H "Authorization: Bearer YOUR_API_KEY"

Query parameters

integer

required

Numeric ID of the evaluation to retrieve.

Response

{
  "id": 42,
  "name": "Chatbot regression",
  "type": "unit_test",
  "status": "active",
  "organizationId": 1,
  "createdAt": "2026-03-15T10:30:00.000Z",
  "updatedAt": "2026-03-15T10:30:00.000Z",
  "testCases": [
    {
      "id": 1,
      "name": "Case A",
      "input": "What is your refund policy?"
    }
  ],
  "runs": [
    {
      "id": 9001,
      "status": "completed",
      "evaluationId": 42
    }
  ]
}

testCases

array

Test cases attached to this evaluation.

runs

array

Recent evaluation runs. Ordered by creation time, most recent first.

Start an evaluation run

Triggers a new run for an existing evaluation. Pass an environment value to tag the run for filtering in the dashboard and in CI comparisons.

curl https://evalgate.com/api/evaluations/42/runs \
  -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"environment": "dev"}'

Path parameters

integer

required

Numeric ID of the evaluation to run.

Request body

environment

string

Target environment for this run. Accepted values: dev, staging, prod. You can also pass the environment via the x-evalai-env request header instead of the body.

Response

{
  "id": 9002,
  "evaluationId": 42,
  "status": "running",
  "environment": "dev",
  "createdAt": "2026-03-15T10:40:00.000Z"
}

integer

Unique ID of the new run.

evaluationId

integer

ID of the parent evaluation.

status

string

Initial status — running. Poll or use webhooks to detect completion.

environment

string

The environment value this run was tagged with.

createdAt

string

ISO 8601 timestamp when the run was created.

To import results from your own test runner instead of triggering a managed run, use POST /api/evaluations/{id}/runs/import with an optional Idempotency-Key header to prevent duplicate runs on CI retry.

Overview

Endpoints

Evaluations

Evaluations API — create and run evals

List evaluations

Query parameters

Response

Create an evaluation

Request body

Response

Get a single evaluation

Query parameters

Response

Start an evaluation run

Path parameters

Request body

Response

Overview

Endpoints

Documentation Index

​Evaluations API — create and run evals

​List evaluations

​Query parameters

​Response

​Create an evaluation

​Request body

​Response

​Get a single evaluation

​Query parameters

​Response

​Start an evaluation run

​Path parameters

​Request body

​Response

Evaluations API — create and run evals

List evaluations

Query parameters

Response

Create an evaluation

Request body

Response

Get a single evaluation

Query parameters

Response

Start an evaluation run

Path parameters

Request body

Response