Skip to main content

Documentation Index

Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Annotations API — human labeling and review

Create annotation tasks, assign traces for human review, and submit labels to build the golden dataset used for measuring LLM judge credibility.
Human annotations are the foundation of judge credibility in Evalgate. When you label a set of traces as pass or fail, those labels become the ground truth that the LLM Judge alignment endpoint compares against automated judge scores. A judge with high alignment against a well-labeled dataset is one you can trust to gate your CI pipeline.

GET /api/annotations/tasks — list annotation tasks

Returns annotation tasks for the authenticated organization.
curl https://evalgate.com/api/annotations/tasks \
  -H "Authorization: Bearer YOUR_API_KEY"

Response

{
  "tasks": [
    {
      "id": 12,
      "name": "Support quality review — March",
      "status": "in_progress",
      "itemCount": 120,
      "completedCount": 87,
      "organizationId": 1,
      "createdAt": "2026-03-01T09:00:00.000Z"
    }
  ]
}
tasks
array

POST /api/annotations/tasks — create an annotation task

Creates a new task and assigns a set of traces for labeling.
curl https://evalgate.com/api/annotations/tasks \
  -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Support quality review — March",
    "traceIds": [42, 43, 44, 45]
  }'

Request body

name
string
required
Display name for this annotation task.
traceIds
array
required
Array of numeric trace IDs to include in this task. Each trace will become one annotation item.
instructions
string
Optional guidance text shown to annotators when they open the task.
labelOptions
array
Optional array of label strings annotators can choose from. Defaults to ["pass", "fail"] when not specified.

Response (201)

{
  "id": 13,
  "name": "Support quality review — March",
  "status": "draft",
  "itemCount": 4,
  "completedCount": 0,
  "organizationId": 1,
  "createdAt": "2026-03-15T11:30:00.000Z"
}

GET /api/annotations/tasks/ — get task details

Returns a single annotation task with its items.
curl https://evalgate.com/api/annotations/tasks/12 \
  -H "Authorization: Bearer YOUR_API_KEY"

Path parameters

id
integer
required
Numeric ID of the annotation task.

Response

{
  "id": 12,
  "name": "Support quality review — March",
  "status": "in_progress",
  "items": [
    {
      "id": 201,
      "traceId": 42,
      "label": "pass",
      "notes": "Clear and complete response",
      "labeledAt": "2026-03-10T14:22:00.000Z",
      "labeledBy": "user@example.com"
    },
    {
      "id": 202,
      "traceId": 43,
      "label": null,
      "notes": null,
      "labeledAt": null,
      "labeledBy": null
    }
  ]
}
items
array

POST /api/annotations/tasks//items — submit an annotation

Submits a label for a single annotation item within a task.
curl https://evalgate.com/api/annotations/tasks/12/items \
  -X POST \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "itemId": 202,
    "label": "fail",
    "notes": "Response did not address the user'\''s core question"
  }'

Path parameters

id
integer
required
Numeric ID of the annotation task.

Request body

itemId
integer
required
Numeric ID of the annotation item to label.
label
string
required
The label to assign. Must be one of the task’s configured labelOptions, or pass / fail by default.
notes
string
Optional free-text notes explaining the label decision. These are stored alongside the label for audit and inter-rater review.

Response

{
  "id": 202,
  "traceId": 43,
  "label": "fail",
  "notes": "Response did not address the user's core question",
  "labeledAt": "2026-03-15T12:05:00.000Z",
  "labeledBy": "user@example.com"
}
Once a task has enough labels, run the LLM Judge alignment check to measure how well your automated judge agrees with your team’s ground truth. A high-alignment judge is safe to use as an automated CI gate.