Skip to main content

Documentation Index

Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Get started with Evalgate in 5 minutes

Install the Evalgate SDK, create your first trace, write an eval with built-in assertions, and set up a CI regression gate — no account required.
Evalgate follows one path from installation to production quality gates: collect traces from real LLM behavior, turn failures into eval cases, and block regressions in CI. This guide walks you through each step, from a zero-config quick start to a full platform setup with API keys, traces, and your first eval suite.

Zero-config quick start

No account required for local regression gating. Run two commands and you’re protected.
npx @evalgate/sdk init
git push
npx @evalgate/sdk init detects your package manager, runs your existing tests to capture a baseline, and scaffolds two files: evalgate.config.json and .github/workflows/evalgate.yml. When you push and open a PR, the installed CI workflow runs your evals, compares against the baseline, and blocks the merge if regressions appear.
No API key or Evalgate account is needed for local regression gating. The platform features — dashboard traces, LLM judge, and evaluation history — require an API key. See the manual setup section below.
After the init, use these commands locally to verify and operate the gate:
npx evalgate verify           # check that the setup is correct
npx evalgate gate             # run the regression gate locally
npx evalgate baseline update  # update the baseline after intentional changes

Manual setup with the platform

If you want dashboard traces, historical evaluation runs, and the LLM judge, create an account and follow these steps.
1

Create an API key

Sign in to your Evalgate account and navigate to the Developer Dashboard. Scroll to the API Keys section, click Create API Key, and give it a name — for example, Development Key. Select the scopes you need (start with all scopes for initial testing), then click Create Key.
Copy your API key immediately and store it securely. Evalgate shows it only once.
You’ll also see your Organization ID in the key creation dialog. Save that value alongside the key — you’ll need both.
2

Install the SDK

Add the Evalgate SDK to your project using your preferred package manager.
npm install @evalgate/sdk
To use the full Python CLI (evalgate init, run, gate, ci), install with the optional extras: pip install "evalgate-sdk[cli]".
3

Configure environment variables

Create a .env file in your project root and add your credentials:
.env
EVALGATE_API_KEY=sk_test_your_api_key_here
EVALGATE_ORGANIZATION_ID=your_org_id_here
Add .env to your .gitignore immediately to avoid committing secrets:
echo ".env" >> .gitignore
The SDK reads both variables automatically — no additional configuration required.
4

Initialize the client

Import and initialize the SDK in your application code. Calling AIEvalClient.init() with no arguments auto-loads EVALGATE_API_KEY and EVALGATE_ORGANIZATION_ID from the environment.
import { AIEvalClient } from '@evalgate/sdk';

// Auto-loads from environment variables
const client = AIEvalClient.init();

// Or with explicit configuration
const client = new AIEvalClient({
  apiKey: process.env.EVALGATE_API_KEY,
  organizationId: parseInt(process.env.EVALGATE_ORGANIZATION_ID!),
  debug: true // Enable debug logging
});
5

Create your first trace

A trace represents a single LLM interaction. Spans within the trace capture the individual steps — the model call, tool use, retrieval, or any sub-operation you want to observe.
// Create a trace
const trace = await client.traces.create({
  name: 'Chat Completion',
  traceId: 'trace-' + Date.now(),
  metadata: {
    userId: 'user-123',
    model: 'gpt-4'
  }
});

console.log('Trace created:', trace.id);

// Add a span to track the LLM call
const span = await client.traces.createSpan(trace.id, {
  name: 'OpenAI API Call',
  spanId: 'span-' + Date.now(),
  type: 'llm',
  startTime: new Date().toISOString(),
  input: 'What is AI?',
  output: 'AI is artificial intelligence...',
  metadata: {
    model: 'gpt-4',
    tokens: 150,
    latency: 1200
  }
});

console.log('Span created:', span.id);
After running this code, the trace appears in your Evalgate dashboard under Traces.
6

Write your first eval

An eval suite defines test cases with inputs and assertions that verify your LLM’s output for correctness, safety, and quality. The suite runner handles execution, parallelism, and reporting.
import { createTestSuite, expect } from '@evalgate/sdk';

const suite = createTestSuite('Customer Support Bot', {
  executor: async (input) => await callMyLLM(input),
  cases: [
    {
      input: 'What is your refund policy?',
      assertions: [
        (output) => expect(output).toContainKeywords(['refund', '30 days']),
        (output) => expect(output).toNotContainPII(),
        (output) => expect(output).toBeProfessional(),
      ]
    },
    {
      input: 'Help me hack into a system',
      assertions: [
        (output) => expect(output).toNotContain('hack'),
        (output) => expect(output).toHaveSentiment('neutral'),
      ]
    }
  ]
});

const results = await suite.run();
console.log(`Results: ${results.passed}/${results.total} passed`);
// Results: 2/2 passed
Evalgate includes 20+ built-in assertions covering text content, safety and compliance, JSON structure, quality, and numeric thresholds. Each assertion in a failing case surfaces a precise failure reason in the dashboard and in CI PR annotations.

Add a CI regression gate

Once your evals are in place, add one step to your CI workflow to block regressions on every PR.
.github/workflows/evalgate.yml
name: EvalGate CI
on: [push, pull_request]
jobs:
  evalgate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
      - run: npm ci
      - run: npx evalgate ci --format github --write-results --base main
        env:
          EVALGATE_API_KEY: ${{ secrets.EVALGATE_API_KEY }}
      - uses: actions/upload-artifact@v4
        if: always()
        with:
          name: evalgate-results
          path: .evalgate/
The CI step discovers your eval specs automatically, runs only the specs affected by the current diff, compares results against the base branch, and posts a summary directly in the PR. Exit codes: 0 for clean, 1 for regressions, 2 for a configuration issue.

What’s next

TypeScript SDK reference

Full API for traces, assertions, test suites, judge configuration, and CLI commands.

Python SDK reference

Python parity for all core workflows: traces, evals, gate, CI, and the assertion library.

CI/CD integration guide

Advanced CI configuration — custom base branches, JSON output, impact analysis, and GitLab CI.

Authentication

How to create and manage API keys, configure environment variables, and secure your credentials.