Set up CI/CD regression gates with Evalgate

Gate LLM quality in CI by detecting regressions automatically before they reach production. Block merges when prompts or models degrade scores.

Integrating Evalgate into your CI/CD pipeline means every pull request is evaluated against a committed quality baseline — so you catch prompt regressions, model degradation, and safety failures before they ship. This guide walks you through GitHub Actions setup, GitLab CI configuration, threshold tuning, and the full CLI reference.

One-command CI setup

EvalGate 3.5.0 ships a single ci command that handles discovery, caching, comparison, and reporting in one step. Add the following workflow file to your repository:

.github/workflows/evalgate.yml

name: EvalGate CI
on: [push, pull_request]
jobs:
  evalgate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npx evalgate ci --format github --write-results --base main
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: evalgate-results
          path: .evalgate/

That single npx evalgate ci command does everything:

Discovers all evaluation specs automatically
Runs only impacted specs using smart caching
Compares results against the base branch
Posts a rich summary in the pull request with any regressions highlighted
Exits with the appropriate code (see exit codes below)

The --write-results flag saves run artifacts to .evalgate/ so you can inspect them or feed them to downstream jobs. The --base main flag controls which branch is used as the regression baseline.

Manual / traditional setup

If you prefer to scaffold everything from scratch or need to integrate into an existing workflow, run the init command once:

npx @evalgate/sdk init

This automatically detects your repository, creates evals/baseline.json, generates the CI workflow file, and writes evalgate.config.json. After running it, commit the generated files:

git add evals/ .github/workflows/evalgate-gate.yml evalgate.config.json
git commit -m "chore: add EvalGate regression gate"
git push

The traditional workflow gives you two gate options:

Local gate (no API key)
Platform gate (API key required)

.github/workflows/evalgate-gate.yml

name: EvalGate CI Gate
on:
  pull_request:
    branches: [main]
jobs:
  eval-gate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci
      - name: EvalGate regression gate
        run: npx evalgate gate --format github

.github/workflows/evalgate-gate.yml

name: EvalGate CI Gate
on:
  pull_request:
    branches: [main]
jobs:
  eval-gate:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      pull-requests: write
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm ci
      - name: EvalGate quality gate
        env:
          EVALGATE_API_KEY: ${{ secrets.EVALGATE_API_KEY }}
        run: npx evalgate check --format github --onFail import

GitLab CI configuration

For GitLab, add a dedicated stage to your .gitlab-ci.yml:

.gitlab-ci.yml

eval-gate:
  stage: test
  image: node:20
  script:
    - npm ci
    - npx evalgate gate --format json
  only:
    - merge_requests
    - main

Use --format json in GitLab to get machine-readable output you can parse in downstream jobs or send to an external dashboard.

Setting quality thresholds

Define minimum acceptable scores in evalgate.config.json. The gate fails if any metric falls below its threshold:

evalgate.config.json

{
  "thresholds": {
    "accuracy": 0.85,
    "relevance": 0.80,
    "safety": 1.0,
    "latency_p95": 2000
  },
  "failOnViolation": true
}

When failOnViolation is true, a threshold breach blocks the merge just like a failing test. Set safety to 1.0 to enforce a zero-tolerance policy on harmful outputs.

Store evalgate.config.json and evals/baseline.json in version control. Changing thresholds without a review is equivalent to deleting a test — treat both files as production code.

Exit codes

The gate exits with a numeric code your CI system can act on:

Code	Meaning
`0`	Clean — no regressions detected
`1`	Regressions — tests failed or scores dropped below baseline
`2`	Config error — missing artifacts, invalid API key, or baseline file not found

CLI commands reference

# Scaffold everything: baseline, CI workflow, config (run once)
npx @evalgate/sdk init

# Pre-flight validation: checks config, API key, scopes,
# project key, baseline.json, and CI workflow file
npx evalgate verify

Best practices

Keep CI tests fast

Run a representative subset of test cases in CI and reserve the full suite for nightly or scheduled runs.

Use parallel execution

Split independent evaluation suites into parallel jobs to keep wall-clock time low.

Version control everything

Commit test cases, thresholds, and baselines alongside your application code so changes are reviewable.

Monitor API costs

Track LLM API spend in CI runs to avoid unexpectedly expensive pipelines — use Evalgate’s cost tracking in the dashboard.

Get Started

Core Concepts

Guides

SDK Reference

Platform

Cicd integration

Set up CI/CD regression gates with Evalgate

One-command CI setup

Manual / traditional setup

GitLab CI configuration

Setting quality thresholds

Exit codes

CLI commands reference

Best practices

Keep CI tests fast

Use parallel execution

Version control everything

Monitor API costs

Get Started

Core Concepts

Guides

SDK Reference

Platform

Documentation Index

​Set up CI/CD regression gates with Evalgate

​One-command CI setup

​Manual / traditional setup

​GitLab CI configuration

​Setting quality thresholds

​Exit codes

​CLI commands reference

​Best practices

Keep CI tests fast

Use parallel execution

Version control everything

Monitor API costs

Set up CI/CD regression gates with Evalgate

One-command CI setup

Manual / traditional setup

GitLab CI configuration

Setting quality thresholds

Exit codes

CLI commands reference

Best practices