Skip to main content

Documentation Index

Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Evalgate: AI Quality Infrastructure for LLM Apps

Evalgate traces real AI failures, turns them into eval cases, and blocks regressions in CI — no separate observability stack required.
Evalgate is an evaluation control plane for AI teams shipping LLM applications to production. It closes the loop between what breaks in the real world and what gets tested before the next release: trace real agent behavior, promote failures into reusable evaluation coverage, and enforce quality gates in CI so the same issue never ships twice.

Quick Start

Set up your first eval gate in under 5 minutes — no account required for local gating.

Authentication

Get your API key and start making authenticated requests to the Evalgate platform.

SDK Reference

TypeScript and Python SDKs with full type safety, built-in assertions, and CLI tools.

API Reference

Complete REST API reference with request/response examples and interactive endpoints.

How Evalgate works

Evalgate is built around one operating loop: trace → eval → gate. Every feature feeds into this path.
1

Collect traces from real AI behavior

Instrument your LLM application with the SDK to capture production and staging behavior — inputs, outputs, tool calls, latency, and token usage — with full structured context.
2

Turn failures into reusable eval coverage

Promote failing patterns into test cases and suites. Label traces interactively, cluster failures by behavior, and synthesize golden datasets from real production gaps.
3

Block regressions before release

Run the same assertions in CI so bad behavior never merges unnoticed. One command sets up a complete regression gate with PR annotations and baseline comparison.

Explore by topic

Core Concepts

Understand the trace → eval → gate loop and Evalgate’s data model.

CI/CD Integration

Wire Evalgate into GitHub Actions or GitLab CI to gate every PR.

LLM Judge

Orchestrate multi-judge evaluation with disagreement handling and provider flexibility.

MCP Integration

Use Evalgate tools directly from Cursor, Claude Desktop, or ChatGPT.