Documentation Index
Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Set up CI/CD regression gates with Evalgate
Gate LLM quality in CI by detecting regressions automatically before they reach production. Block merges when prompts or models degrade scores.Integrating Evalgate into your CI/CD pipeline means every pull request is evaluated against a committed quality baseline — so you catch prompt regressions, model degradation, and safety failures before they ship. This guide walks you through GitHub Actions setup, GitLab CI configuration, threshold tuning, and the full CLI reference.
One-command CI setup
EvalGate 3.5.0 ships a singleci command that handles discovery, caching, comparison, and reporting in one step. Add the following workflow file to your repository:
.github/workflows/evalgate.yml
npx evalgate ci command does everything:
- Discovers all evaluation specs automatically
- Runs only impacted specs using smart caching
- Compares results against the base branch
- Posts a rich summary in the pull request with any regressions highlighted
- Exits with the appropriate code (see exit codes below)
The
--write-results flag saves run artifacts to .evalgate/ so you can inspect them or feed them to downstream jobs. The --base main flag controls which branch is used as the regression baseline.Manual / traditional setup
If you prefer to scaffold everything from scratch or need to integrate into an existing workflow, run the init command once:evals/baseline.json, generates the CI workflow file, and writes evalgate.config.json. After running it, commit the generated files:
- Local gate (no API key)
- Platform gate (API key required)
.github/workflows/evalgate-gate.yml
GitLab CI configuration
For GitLab, add a dedicated stage to your.gitlab-ci.yml:
.gitlab-ci.yml
Setting quality thresholds
Define minimum acceptable scores inevalgate.config.json. The gate fails if any metric falls below its threshold:
evalgate.config.json
failOnViolation is true, a threshold breach blocks the merge just like a failing test. Set safety to 1.0 to enforce a zero-tolerance policy on harmful outputs.
Exit codes
The gate exits with a numeric code your CI system can act on:| Code | Meaning |
|---|---|
0 | Clean — no regressions detected |
1 | Regressions — tests failed or scores dropped below baseline |
2 | Config error — missing artifacts, invalid API key, or baseline file not found |
CLI commands reference
Best practices
Keep CI tests fast
Run a representative subset of test cases in CI and reserve the full suite for nightly or scheduled runs.
Use parallel execution
Split independent evaluation suites into parallel jobs to keep wall-clock time low.
Version control everything
Commit test cases, thresholds, and baselines alongside your application code so changes are reviewable.
Monitor API costs
Track LLM API spend in CI runs to avoid unexpectedly expensive pipelines — use Evalgate’s cost tracking in the dashboard.