EvalGate

Installation

TypeScript

npm install openai @evalgate/sdk

Python

pip install openai pauly4010-evalgate-sdk

Basic Setup

TypeScript

import OpenAI from 'openai' import { AIEvalClient, WorkflowTracer, traceOpenAI, traceWorkflowStep } from '@evalgate/sdk' const client = new AIEvalClient({ apiKey: process.env.EVALAI_API_KEY }) const tracer = new WorkflowTracer(client) // Wrap OpenAI client for automatic tracing const openai = traceOpenAI(new OpenAI(), client)

Python

from openai import OpenAI from evalgate_sdk import AIEvalClient, WorkflowTracer from evalgate_sdk.integrations.openai import trace_openai client = AIEvalClient(api_key=os.environ["EVALAI_API_KEY"]) tracer = WorkflowTracer(client) # Wrap OpenAI client for automatic tracing openai = trace_openai(OpenAI(), client)

Environment variables: Make sure you have EVALAI_API_KEY and EVALAI_ORGANIZATION_ID in your .env file.

Tracing OpenAI Calls

Chat Completions

TypeScript

// All OpenAI calls are automatically traced! const response = await openai.chat.completions.create({ model: 'gpt-4', messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'What is the capital of France?' } ], temperature: 0.7 }) console.log(response.choices[0].message.content) // Automatically tracked in EvalAI dashboard: // ✓ Full prompt and response // ✓ Token usage (input/output) // ✓ Latency // ✓ Model and parameters // ✓ Cost estimation

Python

# All OpenAI calls are automatically traced! response = await openai.chat.completions.create( model="gpt-4", messages=[ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "What is the capital of France?"} ], temperature=0.7 ) print(response.choices[0].message.content) # Automatically tracked in EvalAI dashboard: # ✓ Full prompt and response # ✓ Token usage (input/output) # ✓ Latency # ✓ Model and parameters # ✓ Cost estimation

Streaming Responses

TypeScript

// Streaming is automatically traced too! const stream = await openai.chat.completions.create({ model: 'gpt-4', messages: messages, stream: true }) // Stream tokens to user let fullResponse = '' for await (const chunk of stream) { const content = chunk.choices[0]?.delta?.content || '' fullResponse += content process.stdout.write(content) } // Full response is automatically captured in trace

Python

# Streaming is automatically traced too! stream = await openai.chat.completions.create( model="gpt-4", messages=messages, stream=True ) # Stream tokens to user full_response = "" async for chunk in stream: content = chunk.choices[0].delta.content or "" full_response += content print(content, end="", flush=True) # Full response is automatically captured in trace

Function Calling

const tools = [ { type: 'function', function: { name: 'get_weather', description: 'Get current weather for a location', parameters: { type: 'object', properties: { location: { type: 'string' } }, required: ['location'] } } } ] const response = await openai.chat.completions.create({ model: 'gpt-4', messages: messages, tools: tools, tool_choice: 'auto' }) // Function calls are automatically tracked

Embeddings

// Embeddings are also automatically traced const embedding = await openai.embeddings.create({ model: 'text-embedding-3-small', input: 'Your text here' }) const vector = embedding.data[0].embedding

Advanced Patterns

Multi-Turn Conversations

import { traceWorkflowStep } from '@evalgate/sdk' await tracer.startWorkflow('multi-turn-conversation', undefined, { sessionId: 'session_456' }); const messages = []; // Turn 1 messages.push({ role: 'user', content: 'Hello!' }); const response1 = await traceWorkflowStep(tracer, 'turn-1', () => openai.chat.completions.create({ model: 'gpt-4', messages }) ); messages.push(response1.choices[0].message); // Turn 2 messages.push({ role: 'user', content: 'Tell me a joke' }); const response2 = await traceWorkflowStep(tracer, 'turn-2', () => openai.chat.completions.create({ model: 'gpt-4', messages }) ); messages.push(response2.choices[0].message); await tracer.endWorkflow({ status: 'success' });

Retry Logic with Tracing

async function callOpenAIWithRetry(messages, maxRetries = 3) { await tracer.startWorkflow('openai-with-retry', undefined, { maxRetries }); for (let attempt = 1; attempt <= maxRetries; attempt++) { try { const result = await traceWorkflowStep( tracer, `attempt-${attempt}`, () => openai.chat.completions.create({ model: 'gpt-4', messages }) ); await tracer.endWorkflow({ status: 'success' }); return result; } catch (error) { if (attempt === maxRetries) { await tracer.endWorkflow({ status: 'failed', error: error.message }); throw error; } await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000)); } } }

Parallel Requests

await tracer.startWorkflow('generate-variations', undefined, { count: 3 }); const prompts = [ 'Write a formal email...', 'Write a casual email...', 'Write a brief email...' ]; const variations = await Promise.all( prompts.map((prompt, i) => traceWorkflowStep(tracer, `variation-${i + 1}`, () => openai.chat.completions.create({ model: 'gpt-4', messages: [{ role: 'user', content: prompt }] }) ) ) ); await tracer.endWorkflow({ status: 'success' });

Evaluation Integration

Create Test Cases

A/B Testing Models

async function abTestModels(prompt) { const variant = Math.random() < 0.5 ? 'gpt-4' : 'gpt-3.5-turbo'; await tracer.startWorkflow('model-ab-test', undefined, { variant, experimentId: 'gpt4-vs-gpt35' }); const result = await traceWorkflowStep(tracer, 'llm-call', () => openai.chat.completions.create({ model: variant, messages: [{ role: 'user', content: prompt }] }) ); await tracer.endWorkflow({ status: 'success' }); return result; } // Analyze results in dashboard to compare: // - Quality scores // - Latency // - Cost // - User satisfaction

Best Practices

1. Add Contextual Metadata

await tracer.startWorkflow('content-generation', undefined, { userId: user.id, contentType: 'blog-post', targetAudience: 'developers', tone: 'professional', model: 'gpt-4', temperature: 0.7, maxTokens: 2000 }); const span = await tracer.startAgentSpan('ContentAgent', { input: '...' }); // OpenAI call await tracer.endAgentSpan(span, { result: '...' }); await tracer.endWorkflow({ status: 'success' });

2. Track Token Usage

Automatically tracked in every trace:

Input tokens
Output tokens
Total cost (calculated from pricing)

3. Monitor for Errors

try { await tracer.startWorkflow('api-call'); const result = await traceWorkflowStep(tracer, 'llm-call', () => openai.chat.completions.create({...}) ); await tracer.endWorkflow({ status: 'success' }); return result; } catch (error) { await tracer.endWorkflow({ status: 'failed', error: error.message }); console.error('OpenAI error:', error.message); if (error.status === 429) { // Rate limit hit } else if (error.status === 500) { // OpenAI service error } }

4. Set Timeouts

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY, timeout: 30000, // 30 second timeout maxRetries: 2 }); // Timeout tracked in trace automatically

Troubleshooting

Traces not capturing token usage?

Ensure you're using the latest version of the SDK.

High latency in traces?

Check if you're using synchronous operations. Use async/await consistently.

Missing streaming response data?

The SDK automatically buffers streaming responses for complete trace capture.

Real-World Example

Content Moderation System

Setup: GPT-4 API for content safety classification

Tracing: All API calls traced with content metadata

Evaluation: 500 test cases with known safe/unsafe content

Results:

98.5% accuracy on test suite
Average latency: 850ms
Caught regression when model was updated
Monthly cost: $420 (tracked via traces)

Start Evaluating View All Guides

Related Guides

Setting Up Tracing in Your Application

General tracing concepts

Optimizing Token Usage and Latency

Reduce OpenAI API costs

Installation

TypeScript

npm install openai @evalgate/sdk

Python

pip install openai pauly4010-evalgate-sdk

Basic Setup

TypeScript

Python

Environment variables: Make sure you have EVALAI_API_KEY and EVALAI_ORGANIZATION_ID in your .env file.