Skip to main content

Documentation Index

Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Set up CI/CD regression gates with Evalgate

Gate LLM quality in CI by detecting regressions automatically before they reach production. Block merges when prompts or models degrade scores.
Integrating Evalgate into your CI/CD pipeline means every pull request is evaluated against a committed quality baseline — so you catch prompt regressions, model degradation, and safety failures before they ship. This guide walks you through GitHub Actions setup, GitLab CI configuration, threshold tuning, and the full CLI reference.

One-command CI setup

EvalGate 3.5.0 ships a single ci command that handles discovery, caching, comparison, and reporting in one step. Add the following workflow file to your repository:
.github/workflows/evalgate.yml
name: EvalGate CI
on: [push, pull_request]
jobs:
  evalgate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npx evalgate ci --format github --write-results --base main
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: evalgate-results
          path: .evalgate/
That single npx evalgate ci command does everything:
  • Discovers all evaluation specs automatically
  • Runs only impacted specs using smart caching
  • Compares results against the base branch
  • Posts a rich summary in the pull request with any regressions highlighted
  • Exits with the appropriate code (see exit codes below)
The --write-results flag saves run artifacts to .evalgate/ so you can inspect them or feed them to downstream jobs. The --base main flag controls which branch is used as the regression baseline.

Manual / traditional setup

If you prefer to scaffold everything from scratch or need to integrate into an existing workflow, run the init command once:
npx @evalgate/sdk init
This automatically detects your repository, creates evals/baseline.json, generates the CI workflow file, and writes evalgate.config.json. After running it, commit the generated files:
git add evals/ .github/workflows/evalgate-gate.yml evalgate.config.json
git commit -m "chore: add EvalGate regression gate"
git push
The traditional workflow gives you two gate options:
.github/workflows/evalgate-gate.yml
name: EvalGate CI Gate
on:
  pull_request:
    branches: [main]
jobs:
  eval-gate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci
      - name: EvalGate regression gate
        run: npx evalgate gate --format github

GitLab CI configuration

For GitLab, add a dedicated stage to your .gitlab-ci.yml:
.gitlab-ci.yml
eval-gate:
  stage: test
  image: node:20
  script:
    - npm ci
    - npx evalgate gate --format json
  only:
    - merge_requests
    - main
Use --format json in GitLab to get machine-readable output you can parse in downstream jobs or send to an external dashboard.

Setting quality thresholds

Define minimum acceptable scores in evalgate.config.json. The gate fails if any metric falls below its threshold:
evalgate.config.json
{
  "thresholds": {
    "accuracy": 0.85,
    "relevance": 0.80,
    "safety": 1.0,
    "latency_p95": 2000
  },
  "failOnViolation": true
}
When failOnViolation is true, a threshold breach blocks the merge just like a failing test. Set safety to 1.0 to enforce a zero-tolerance policy on harmful outputs.
Store evalgate.config.json and evals/baseline.json in version control. Changing thresholds without a review is equivalent to deleting a test — treat both files as production code.

Exit codes

The gate exits with a numeric code your CI system can act on:
CodeMeaning
0Clean — no regressions detected
1Regressions — tests failed or scores dropped below baseline
2Config error — missing artifacts, invalid API key, or baseline file not found

CLI commands reference

# Scaffold everything: baseline, CI workflow, config (run once)
npx @evalgate/sdk init

# Pre-flight validation: checks config, API key, scopes,
# project key, baseline.json, and CI workflow file
npx evalgate verify

Best practices

Keep CI tests fast

Run a representative subset of test cases in CI and reserve the full suite for nightly or scheduled runs.

Use parallel execution

Split independent evaluation suites into parallel jobs to keep wall-clock time low.

Version control everything

Commit test cases, thresholds, and baselines alongside your application code so changes are reviewable.

Monitor API costs

Track LLM API spend in CI runs to avoid unexpectedly expensive pipelines — use Evalgate’s cost tracking in the dashboard.