Skip to main content

Documentation Index

Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Trace OpenAI API calls with Evalgate

Wrap the OpenAI client to automatically capture prompts, responses, token usage, latency, and cost estimation for every API request you make.
The Evalgate SDK integrates with the OpenAI client through a thin wrapper that intercepts every API call and records it as a trace — no manual instrumentation required. Chat completions, streaming responses, function calls, and embeddings are all captured automatically. This guide shows you how to set up the wrapper and use it for multi-turn conversations, model A/B testing, and error tracking.

Install dependencies

npm install openai @evalgate/sdk

Wrap the OpenAI client

Replace your existing OpenAI client initialization with the traced version. All subsequent calls through this client are automatically recorded:
import OpenAI from 'openai'
import { AIEvalClient, WorkflowTracer, traceOpenAI, traceWorkflowStep } from '@evalgate/sdk'

const client = new AIEvalClient({
  apiKey: process.env.EVALGATE_API_KEY
})

const tracer = new WorkflowTracer(client)

// Wrap the OpenAI client — all calls are now automatically traced
const openai = traceOpenAI(new OpenAI(), client)
Make sure EVALGATE_API_KEY and EVALGATE_ORGANIZATION_ID are set in your .env file before initializing the client.

Chat completions

Call the OpenAI API exactly as you normally would. Evalgate captures the full interaction in the background:
// No extra instrumentation needed — the wrapped client handles it
const response = await openai.chat.completions.create({
  model: 'gpt-5.2-chat-latest',
  messages: [
    { role: 'system', content: 'You are a helpful assistant.' },
    { role: 'user', content: 'What is the capital of France?' }
  ],
  temperature: 0.7
})

console.log(response.choices[0].message.content)
Every chat completion automatically records: full prompt and response, token usage (input and output), latency, model name and parameters, and cost estimation.

Streaming responses

Streaming works exactly the same way. The SDK buffers the full response as it streams so the complete output appears in the trace:
const stream = await openai.chat.completions.create({
  model: 'gpt-5.2-chat-latest',
  messages: messages,
  stream: true
})

let fullResponse = ''
for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content || ''
  fullResponse += content
  process.stdout.write(content)
}

// The full assembled response is automatically captured in the trace

Function calling

Function calls are traced like any other completion. The tool definitions, selected function, and arguments all appear in the trace:
TypeScript
const tools = [
  {
    type: 'function',
    function: {
      name: 'get_weather',
      description: 'Get current weather for a location',
      parameters: {
        type: 'object',
        properties: {
          location: { type: 'string' }
        },
        required: ['location']
      }
    }
  }
]

const response = await openai.chat.completions.create({
  model: 'gpt-5.2-chat-latest',
  messages: messages,
  tools: tools,
  tool_choice: 'auto'
})

// Function calls are automatically tracked in the trace

Multi-turn conversations

Use WorkflowTracer to group multiple turns of a conversation into a single workflow trace with per-turn spans:
TypeScript
import { traceWorkflowStep } from '@evalgate/sdk'

await tracer.startWorkflow('multi-turn-conversation', undefined, {
  sessionId: 'session_456'
})

const messages = []

// Turn 1
messages.push({ role: 'user', content: 'Hello!' })
const response1 = await traceWorkflowStep(tracer, 'turn-1', () =>
  openai.chat.completions.create({ model: 'gpt-5.2-chat-latest', messages })
)
messages.push(response1.choices[0].message)

// Turn 2
messages.push({ role: 'user', content: 'Tell me a joke' })
const response2 = await traceWorkflowStep(tracer, 'turn-2', () =>
  openai.chat.completions.create({ model: 'gpt-5.2-chat-latest', messages })
)
messages.push(response2.choices[0].message)

await tracer.endWorkflow({ status: 'success' })

A/B testing models

Trace model variants with metadata so you can compare quality, latency, and cost across experiments in the dashboard:
TypeScript
async function abTestModels(prompt: string) {
  const variant = Math.random() < 0.5 ? 'gpt-5.2-chat-latest' : 'gpt-4.1-mini'

  await tracer.startWorkflow('model-ab-test', undefined, {
    variant,
    experimentId: 'gpt5-vs-gpt41mini'
  })

  const result = await traceWorkflowStep(tracer, 'llm-call', () =>
    openai.chat.completions.create({
      model: variant,
      messages: [{ role: 'user', content: prompt }]
    })
  )

  await tracer.endWorkflow({ status: 'success' })
  return result
}

// Analyze results in the dashboard to compare:
// - Quality scores
// - Latency
// - Cost per completion
// - User satisfaction signals

What gets tracked automatically

FieldDescription
Full prompt / responseEvery message in the conversation, including system prompts
Token usageInput tokens, output tokens, and total per request
LatencyTime to first token and total request duration
ModelModel name, temperature, and other parameters
Cost estimationCalculated from current OpenAI pricing

Troubleshooting

Traces not capturing token usage? Ensure you are on the latest version of @evalgate/sdk or evalgate-sdk. Older versions may not parse the usage fields correctly. Noticing high latency in traces? Make sure you are using async/await consistently. Synchronous operations block the event loop and inflate latency measurements. Missing streaming response data? The SDK automatically buffers streaming responses. If data appears missing, confirm the stream iterator is fully consumed (the for await loop completes) before the process exits.