Skip to main content

Documentation Index

Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

@evalgate/sdk TypeScript reference

Full API reference for @evalgate/sdk — client initialization, traces, evaluations, LLM judge, WorkflowTracer, and the OpenAI integration wrapper.
The @evalgate/sdk package is the TypeScript surface for Evalgate’s evaluation control plane. Use it to instrument traces, run evals, orchestrate judges, gate regressions in CI, and move through the full loop from real failures to shippable improvements.

Package info

FieldValue
npm package@evalgate/sdk
Version3.5.0
Node>=16.0.0
Exports. (main), ./assertions, ./testing, ./integrations/openai, ./integrations/anthropic
Peer depsopenai ^4.0.0 (optional), @anthropic-ai/sdk ^0.20.0 (optional)
CLInpx evalgate

Install

npm install @evalgate/sdk

Initialize the client

Every request sends an Authorization: Bearer <apiKey> header. You can configure the client with environment variables or pass options explicitly.
Set EVALGATE_API_KEY, EVALGATE_ORGANIZATION_ID, and EVALGATE_BASE_URL in your environment, then call init() with no arguments:
import { AIEvalClient } from '@evalgate/sdk';

const client = AIEvalClient.init();

Client modules

The client exposes the following API modules:
client.traces          → TraceAPI
client.evaluations     → EvaluationAPI
client.llmJudge        → LLMJudgeAPI
client.annotations     → AnnotationsAPI
client.developer       → DeveloperAPI (apiKeys, webhooks, usage)
client.organizations   → OrganizationsAPI

TraceAPI

Use client.traces to create and manage traces and their spans.
client.traces.create({
  name: string,
  traceId: string,
  organizationId?: number,  // falls back to client's orgId
  status?: string,          // 'pending' | 'success' | 'error'
  durationMs?: number,
  metadata?: Record<string, unknown>,
}) → Promise<Trace>
const trace = await client.traces.create({
  name: 'Chat Completion',
  traceId: 'trace-' + Date.now(),
  metadata: { model: 'gpt-4' },
});

console.log(trace.id);
client.traces.list({
  limit?: number,       // max 100
  offset?: number,
  organizationId?: number,
  status?: string,
  search?: string,
}) → Promise<Trace[]>
client.traces.get(id: number) → Promise<TraceDetail>
// TraceDetail = { trace: Trace, spans: Span[] }
client.traces.delete(id: number) → Promise<{ message: string }>
client.traces.createSpan(traceId: number, {
  name: string,
  spanId: string,
  parentSpanId?: string,
  startTime: string,     // ISO 8601
  endTime?: string,
  durationMs?: number,
  metadata?: Record<string, unknown>,
}) → Promise<Span>
await client.traces.createSpan(trace.id, {
  name: 'OpenAI API Call',
  spanId: 'span-' + Date.now(),
  startTime: new Date().toISOString(),
  metadata: { tokens: 150, latency_ms: 1200 },
});

EvaluationAPI

Use client.evaluations to create evaluation definitions and run them against your test cases.
client.evaluations.create({
  name: string,
  type: 'unit_test' | 'human_eval' | 'model_eval' | 'ab_test',
  category?: string,
  description?: string,
  organizationId?: number,
}) → Promise<Evaluation>
client.evaluations.run(id: number, {
  environment?: string,
  metadata?: Record<string, unknown>,
}) → Promise<EvaluationRun>
client.evaluations.importResults(id: number, {
  environment: string,
  importClientVersion: string,
  results: Array<{
    testCaseId: number,
    status: 'passed' | 'failed' | 'skipped',
    output?: string,
    latencyMs?: number,
    errorMessage?: string,
  }>,
}) → Promise<{ runId: number, score: number }>

LLMJudgeAPI

Use client.llmJudge to list available judges, configure multi-judge committees, and run evaluations against specific inputs and outputs.
client.llmJudge.listRegistry() → Promise<JudgeRegistryEntry[]>
const registry = await client.llmJudge.listRegistry();
client.llmJudge.listPresets() → Promise<JudgePreset[]>
const presets = await client.llmJudge.listPresets();
Create a multi-judge committee and evaluate a specific input/output pair:
const config = await client.llmJudge.createConfig({
  name: 'Support quality committee',
  provider: 'openai',
  model: 'gpt-5.2-chat-latest',
  promptTemplate: 'Return strict JSON with score, passed, reasoning, and signals.',
  aggregation: 'weighted',
  judges: [
    {
      id: 'primary',
      type: 'llm',
      provider: 'openai',
      model: 'gpt-5.2-chat-latest',
      weight: 0.6,
    },
    {
      id: 'backup',
      type: 'llm',
      provider: 'anthropic',
      model: 'claude-sonnet-4-20250514',
      weight: 0.4,
    },
  ],
});

const evaluation = await client.llmJudge.evaluate({
  configId: config.id,
  input: 'Cancel my subscription',
  output: "I've canceled your plan effective today.",
  behavior: 'tool_use',
  taskType: 'support',
});

console.log(evaluation.result.score, evaluation.result.reasoning);

createTestSuite

Use createTestSuite to define a named set of test cases with an executor and inline assertions. The runner handles execution, parallelism, and reporting.
import { createTestSuite, expect } from '@evalgate/sdk';

const suite = createTestSuite('Customer Support Bot', {
  executor: async (input) => await callMyLLM(input),
  cases: [
    {
      input: 'What is your refund policy?',
      assertions: [
        (output) => expect(output).toContainKeywords(['refund', '30 days']),
        (output) => expect(output).toNotContainPII(),
        (output) => expect(output).toBeProfessional(),
      ],
    },
    {
      input: 'Help me hack into a system',
      assertions: [
        (output) => expect(output).toNotContain('hack'),
        (output) => expect(output).toHaveSentiment('neutral'),
      ],
    },
  ],
});

const results = await suite.run();
// { name: 'Customer Support Bot', total: 2, passed: 2, failed: 0, results: [...] }

WorkflowTracer

WorkflowTracer gives you structured span tracking for multi-agent workflows — start and end workflows and agent spans, record handoffs and decisions, and track per-provider token cost.

Instantiate

import { WorkflowTracer, createWorkflowTracer } from '@evalgate/sdk';

const tracer = new WorkflowTracer(client, {
  organizationId?: number,
  autoCalculateCost?: boolean,    // default true
  tracePrefix?: string,           // default 'workflow'
  captureFullPayloads?: boolean,  // default true
  debug?: boolean,                // default false
});

// Or use the factory helper:
const tracer = createWorkflowTracer(client, options);

Method signatures

tracer.startWorkflow(
  name: string,
  definition?: WorkflowDefinition,
  metadata?: Record<string, unknown>
) → Promise<WorkflowContext>
WorkflowDefinition shape:
{
  nodes: Array<{
    id: string,
    type: 'agent' | 'tool' | 'decision' | 'parallel' | 'human' | 'llm',
    name: string,
    config?: Record<string, unknown>,
  }>,
  edges: Array<{
    from: string,
    to: string,
    condition?: string,
    label?: string,
  }>,
  entrypoint: string,
  metadata?: Record<string, unknown>,
}
tracer.endWorkflow(
  output?: Record<string, unknown>,
  status?: 'running' | 'completed' | 'failed' | 'cancelled'  // default 'completed'
) → Promise<void>
tracer.startAgentSpan(
  agentName: string,
  input?: Record<string, unknown>,
  parentSpanId?: string
) → Promise<AgentSpanContext>

tracer.endAgentSpan(
  span: AgentSpanContext,
  output?: Record<string, unknown>,
  error?: string
) → Promise<void>
Wrap any async function as a named workflow step without manual start/end calls:
import { traceWorkflowStep } from '@evalgate/sdk';

const result = await traceWorkflowStep(tracer, 'MyAgent', async () => {
  return await doWork();
}, { input: 'data' });

Full example

import { AIEvalClient, WorkflowTracer } from '@evalgate/sdk';

const client = AIEvalClient.init();
const tracer = new WorkflowTracer(client, { debug: true });

await tracer.startWorkflow('Customer Support Pipeline', {
  nodes: [
    { id: 'router', type: 'agent', name: 'RouterAgent' },
    { id: 'tech', type: 'agent', name: 'TechAgent' },
  ],
  edges: [{ from: 'router', to: 'tech', condition: 'is_technical' }],
  entrypoint: 'router',
});

const span = await tracer.startAgentSpan('RouterAgent', { query: 'API error' });
await tracer.recordCost({ provider: 'openai', model: 'gpt-4o', inputTokens: 500, outputTokens: 200 });
await tracer.endAgentSpan(span, { route: 'technical' });

await tracer.recordHandoff('RouterAgent', 'TechAgent', { route: 'technical' });

const span2 = await tracer.startAgentSpan('TechAgent');
await tracer.endAgentSpan(span2, { result: 'Issue resolved' });

await tracer.endWorkflow({ result: 'success' });
console.log('Total cost:', tracer.getTotalCost());

OpenAI integration

Import traceOpenAI from the ./integrations/openai export to wrap an OpenAI client and automatically capture LLM spans:
import { traceOpenAI } from '@evalgate/sdk/integrations/openai';
import OpenAI from 'openai';

const openai = traceOpenAI(new OpenAI(), tracer);

// All calls through `openai` are now traced automatically
const response = await openai.chat.completions.create({
  model: 'gpt-4o',
  messages: [{ role: 'user', content: 'Summarize this document.' }],
});
The ./integrations/anthropic export provides an equivalent traceAnthropic wrapper for Anthropic clients. Both require the respective peer dependency to be installed.