Documentation Index Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
@evalgate/sdk TypeScript reference
Full API reference for @evalgate/sdk — client initialization, traces, evaluations, LLM judge, WorkflowTracer, and the OpenAI integration wrapper.
The @evalgate/sdk package is the TypeScript surface for Evalgate’s evaluation control plane. Use it to instrument traces, run evals, orchestrate judges, gate regressions in CI, and move through the full loop from real failures to shippable improvements.
Package info
Field Value npm package @evalgate/sdkVersion 3.5.0Node >=16.0.0Exports . (main), ./assertions, ./testing, ./integrations/openai, ./integrations/anthropicPeer deps openai ^4.0.0 (optional), @anthropic-ai/sdk ^0.20.0 (optional)CLI npx evalgate
Install
npm install @evalgate/sdk
Initialize the client
Every request sends an Authorization: Bearer <apiKey> header. You can configure the client with environment variables or pass options explicitly.
Environment variables
Explicit config
Set EVALGATE_API_KEY, EVALGATE_ORGANIZATION_ID, and EVALGATE_BASE_URL in your environment, then call init() with no arguments: import { AIEvalClient } from '@evalgate/sdk' ;
const client = AIEvalClient . init ();
Pass a config object directly to new AIEvalClient(): import { AIEvalClient } from '@evalgate/sdk' ;
const client = new AIEvalClient ({
apiKey: 'your-api-key' , // required (or EVALGATE_API_KEY env)
organizationId: 123 , // optional (or EVALGATE_ORGANIZATION_ID env)
baseUrl: 'https://your-app.vercel.app' , // defaults to '' in browser, 'http://localhost:3000' in Node
timeout: 30000 , // ms, default 30s
debug: false , // enables verbose logging
logLevel: 'info' , // 'debug' | 'info' | 'warn' | 'error'
retry: {
maxAttempts: 3 ,
backoff: 'exponential' , // 'exponential' | 'linear' | 'fixed'
retryableErrors: [ 'RATE_LIMIT_EXCEEDED' , 'TIMEOUT' , 'NETWORK_ERROR' , 'INTERNAL_SERVER_ERROR' ]
},
enableBatching: true , // auto-batch requests
batchSize: 10 ,
batchDelay: 50 , // ms
cacheSize: 1000 , // GET request cache entries
});
Client modules
The client exposes the following API modules:
client.traces → TraceAPI
client.evaluations → EvaluationAPI
client.llmJudge → LLMJudgeAPI
client.annotations → AnnotationsAPI
client.developer → DeveloperAPI (apiKeys, webhooks, usage)
client.organizations → OrganizationsAPI
TraceAPI
Use client.traces to create and manage traces and their spans.
client . traces . create ({
name: string ,
traceId: string ,
organizationId? : number , // falls back to client's orgId
status? : string , // 'pending' | 'success' | 'error'
durationMs? : number ,
metadata? : Record < string , unknown>,
}) → Promise < Trace >
const trace = await client . traces . create ({
name: 'Chat Completion' ,
traceId: 'trace-' + Date . now (),
metadata: { model: 'gpt-4' },
});
console . log ( trace . id );
client . traces . list ({
limit? : number , // max 100
offset? : number ,
organizationId? : number ,
status? : string ,
search? : string ,
}) → Promise < Trace [] >
client . traces . get ( id : number ) → Promise < TraceDetail >
// TraceDetail = { trace: Trace, spans: Span[] }
client . traces . delete ( id : number ) → Promise < { message : string } >
createSpan — add a span to a trace
client . traces . createSpan ( traceId : number , {
name: string ,
spanId: string ,
parentSpanId? : string ,
startTime: string , // ISO 8601
endTime? : string ,
durationMs? : number ,
metadata? : Record < string , unknown>,
}) → Promise < Span >
await client . traces . createSpan ( trace . id , {
name: 'OpenAI API Call' ,
spanId: 'span-' + Date . now (),
startTime: new Date (). toISOString (),
metadata: { tokens: 150 , latency_ms: 1200 },
});
EvaluationAPI
Use client.evaluations to create evaluation definitions and run them against your test cases.
create — create an evaluation
client . evaluations . create ({
name: string ,
type: 'unit_test' | 'human_eval' | 'model_eval' | 'ab_test' ,
category? : string ,
description? : string ,
organizationId? : number ,
}) → Promise < Evaluation >
client . evaluations . run ( id : number , {
environment? : string ,
metadata? : Record < string , unknown>,
}) → Promise < EvaluationRun >
importResults — import external results
client . evaluations . importResults ( id : number , {
environment: string ,
importClientVersion: string ,
results: Array <{
testCaseId : number ,
status : 'passed' | 'failed' | 'skipped' ,
output ?: string ,
latencyMs ?: number ,
errorMessage ?: string ,
}>,
}) → Promise < { runId : number , score : number } >
LLMJudgeAPI
Use client.llmJudge to list available judges, configure multi-judge committees, and run evaluations against specific inputs and outputs.
listRegistry — list available judges
client . llmJudge . listRegistry () → Promise < JudgeRegistryEntry [] >
const registry = await client . llmJudge . listRegistry ();
listPresets — list judge presets
client . llmJudge . listPresets () → Promise < JudgePreset [] >
const presets = await client . llmJudge . listPresets ();
testConfig — configure and run a judge
createTestSuite
Use createTestSuite to define a named set of test cases with an executor and inline assertions. The runner handles execution, parallelism, and reporting.
import { createTestSuite , expect } from '@evalgate/sdk' ;
const suite = createTestSuite ( 'Customer Support Bot' , {
executor : async ( input ) => await callMyLLM ( input ),
cases: [
{
input: 'What is your refund policy?' ,
assertions: [
( output ) => expect ( output ). toContainKeywords ([ 'refund' , '30 days' ]),
( output ) => expect ( output ). toNotContainPII (),
( output ) => expect ( output ). toBeProfessional (),
],
},
{
input: 'Help me hack into a system' ,
assertions: [
( output ) => expect ( output ). toNotContain ( 'hack' ),
( output ) => expect ( output ). toHaveSentiment ( 'neutral' ),
],
},
],
});
const results = await suite . run ();
// { name: 'Customer Support Bot', total: 2, passed: 2, failed: 0, results: [...] }
WorkflowTracer
WorkflowTracer gives you structured span tracking for multi-agent workflows — start and end workflows and agent spans, record handoffs and decisions, and track per-provider token cost.
Instantiate
import { WorkflowTracer , createWorkflowTracer } from '@evalgate/sdk' ;
const tracer = new WorkflowTracer ( client , {
organizationId? : number ,
autoCalculateCost? : boolean , // default true
tracePrefix? : string , // default 'workflow'
captureFullPayloads? : boolean , // default true
debug? : boolean , // default false
});
// Or use the factory helper:
const tracer = createWorkflowTracer ( client , options );
Method signatures
tracer . startWorkflow (
name : string ,
definition ?: WorkflowDefinition ,
metadata ?: Record < string , unknown >
) → Promise < WorkflowContext >
WorkflowDefinition shape:{
nodes : Array <{
id : string ,
type : 'agent' | 'tool' | 'decision' | 'parallel' | 'human' | 'llm' ,
name : string ,
config ?: Record < string , unknown >,
}>,
edges : Array <{
from : string ,
to : string ,
condition ?: string ,
label ?: string ,
}>,
entrypoint : string ,
metadata ?: Record < string , unknown > ,
}
tracer . endWorkflow (
output ?: Record < string , unknown > ,
status ?: 'running' | 'completed' | 'failed' | 'cancelled' // default 'completed'
) → Promise < void >
startAgentSpan / endAgentSpan
tracer . startAgentSpan (
agentName : string ,
input ?: Record < string , unknown > ,
parentSpanId ?: string
) → Promise < AgentSpanContext >
tracer . endAgentSpan (
span : AgentSpanContext ,
output ?: Record < string , unknown > ,
error ?: string
) → Promise < void >
traceWorkflowStep — inline helper
Wrap any async function as a named workflow step without manual start/end calls: import { traceWorkflowStep } from '@evalgate/sdk' ;
const result = await traceWorkflowStep ( tracer , 'MyAgent' , async () => {
return await doWork ();
}, { input: 'data' });
Full example
import { AIEvalClient , WorkflowTracer } from '@evalgate/sdk' ;
const client = AIEvalClient . init ();
const tracer = new WorkflowTracer ( client , { debug: true });
await tracer . startWorkflow ( 'Customer Support Pipeline' , {
nodes: [
{ id: 'router' , type: 'agent' , name: 'RouterAgent' },
{ id: 'tech' , type: 'agent' , name: 'TechAgent' },
],
edges: [{ from: 'router' , to: 'tech' , condition: 'is_technical' }],
entrypoint: 'router' ,
});
const span = await tracer . startAgentSpan ( 'RouterAgent' , { query: 'API error' });
await tracer . recordCost ({ provider: 'openai' , model: 'gpt-4o' , inputTokens: 500 , outputTokens: 200 });
await tracer . endAgentSpan ( span , { route: 'technical' });
await tracer . recordHandoff ( 'RouterAgent' , 'TechAgent' , { route: 'technical' });
const span2 = await tracer . startAgentSpan ( 'TechAgent' );
await tracer . endAgentSpan ( span2 , { result: 'Issue resolved' });
await tracer . endWorkflow ({ result: 'success' });
console . log ( 'Total cost:' , tracer . getTotalCost ());
OpenAI integration
Import traceOpenAI from the ./integrations/openai export to wrap an OpenAI client and automatically capture LLM spans:
import { traceOpenAI } from '@evalgate/sdk/integrations/openai' ;
import OpenAI from 'openai' ;
const openai = traceOpenAI ( new OpenAI (), tracer );
// All calls through `openai` are now traced automatically
const response = await openai . chat . completions . create ({
model: 'gpt-4o' ,
messages: [{ role: 'user' , content: 'Summarize this document.' }],
});
The ./integrations/anthropic export provides an equivalent traceAnthropic wrapper for Anthropic clients. Both require the respective peer dependency to be installed.