Documentation Index
Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Trace OpenAI API calls with Evalgate
Wrap the OpenAI client to automatically capture prompts, responses, token usage, latency, and cost estimation for every API request you make.
The Evalgate SDK integrates with the OpenAI client through a thin wrapper that intercepts every API call and records it as a trace — no manual instrumentation required. Chat completions, streaming responses, function calls, and embeddings are all captured automatically. This guide shows you how to set up the wrapper and use it for multi-turn conversations, model A/B testing, and error tracking.
Install dependencies
npm install openai @evalgate/sdk
Wrap the OpenAI client
Replace your existing OpenAI client initialization with the traced version. All subsequent calls through this client are automatically recorded:
import OpenAI from 'openai'
import { AIEvalClient, WorkflowTracer, traceOpenAI, traceWorkflowStep } from '@evalgate/sdk'
const client = new AIEvalClient({
apiKey: process.env.EVALGATE_API_KEY
})
const tracer = new WorkflowTracer(client)
// Wrap the OpenAI client — all calls are now automatically traced
const openai = traceOpenAI(new OpenAI(), client)
Make sure EVALGATE_API_KEY and EVALGATE_ORGANIZATION_ID are set in your .env file before initializing the client.
Chat completions
Call the OpenAI API exactly as you normally would. Evalgate captures the full interaction in the background:
// No extra instrumentation needed — the wrapped client handles it
const response = await openai.chat.completions.create({
model: 'gpt-5.2-chat-latest',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' }
],
temperature: 0.7
})
console.log(response.choices[0].message.content)
Every chat completion automatically records: full prompt and response, token usage (input and output), latency, model name and parameters, and cost estimation.
Streaming responses
Streaming works exactly the same way. The SDK buffers the full response as it streams so the complete output appears in the trace:
const stream = await openai.chat.completions.create({
model: 'gpt-5.2-chat-latest',
messages: messages,
stream: true
})
let fullResponse = ''
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || ''
fullResponse += content
process.stdout.write(content)
}
// The full assembled response is automatically captured in the trace
Function calling
Function calls are traced like any other completion. The tool definitions, selected function, and arguments all appear in the trace:
const tools = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string' }
},
required: ['location']
}
}
}
]
const response = await openai.chat.completions.create({
model: 'gpt-5.2-chat-latest',
messages: messages,
tools: tools,
tool_choice: 'auto'
})
// Function calls are automatically tracked in the trace
Multi-turn conversations
Use WorkflowTracer to group multiple turns of a conversation into a single workflow trace with per-turn spans:
import { traceWorkflowStep } from '@evalgate/sdk'
await tracer.startWorkflow('multi-turn-conversation', undefined, {
sessionId: 'session_456'
})
const messages = []
// Turn 1
messages.push({ role: 'user', content: 'Hello!' })
const response1 = await traceWorkflowStep(tracer, 'turn-1', () =>
openai.chat.completions.create({ model: 'gpt-5.2-chat-latest', messages })
)
messages.push(response1.choices[0].message)
// Turn 2
messages.push({ role: 'user', content: 'Tell me a joke' })
const response2 = await traceWorkflowStep(tracer, 'turn-2', () =>
openai.chat.completions.create({ model: 'gpt-5.2-chat-latest', messages })
)
messages.push(response2.choices[0].message)
await tracer.endWorkflow({ status: 'success' })
A/B testing models
Trace model variants with metadata so you can compare quality, latency, and cost across experiments in the dashboard:
async function abTestModels(prompt: string) {
const variant = Math.random() < 0.5 ? 'gpt-5.2-chat-latest' : 'gpt-4.1-mini'
await tracer.startWorkflow('model-ab-test', undefined, {
variant,
experimentId: 'gpt5-vs-gpt41mini'
})
const result = await traceWorkflowStep(tracer, 'llm-call', () =>
openai.chat.completions.create({
model: variant,
messages: [{ role: 'user', content: prompt }]
})
)
await tracer.endWorkflow({ status: 'success' })
return result
}
// Analyze results in the dashboard to compare:
// - Quality scores
// - Latency
// - Cost per completion
// - User satisfaction signals
What gets tracked automatically
| Field | Description |
|---|
| Full prompt / response | Every message in the conversation, including system prompts |
| Token usage | Input tokens, output tokens, and total per request |
| Latency | Time to first token and total request duration |
| Model | Model name, temperature, and other parameters |
| Cost estimation | Calculated from current OpenAI pricing |
Troubleshooting
Traces not capturing token usage?
Ensure you are on the latest version of @evalgate/sdk or evalgate-sdk. Older versions may not parse the usage fields correctly.
Noticing high latency in traces?
Make sure you are using async/await consistently. Synchronous operations block the event loop and inflate latency measurements.
Missing streaming response data?
The SDK automatically buffers streaming responses. If data appears missing, confirm the stream iterator is fully consumed (the for await loop completes) before the process exits.