Using with OpenAI API
Wrap OpenAI API calls with our tracing SDK for full observability.
8 min read•Integrations
Installation
TypeScript
npm install openai @evalgate/sdk
Python
pip install openai pauly4010-evalgate-sdk
Basic Setup
TypeScript
import OpenAI from 'openai'
import { AIEvalClient, WorkflowTracer, traceOpenAI, traceWorkflowStep } from '@evalgate/sdk'
const client = new AIEvalClient({ apiKey: process.env.EVALAI_API_KEY })
const tracer = new WorkflowTracer(client)
// Wrap OpenAI client for automatic tracing
const openai = traceOpenAI(new OpenAI(), client)
Python
from openai import OpenAI
from evalgate_sdk import AIEvalClient, WorkflowTracer
from evalgate_sdk.integrations.openai import trace_openai
client = AIEvalClient(api_key=os.environ["EVALAI_API_KEY"])
tracer = WorkflowTracer(client)
# Wrap OpenAI client for automatic tracing
openai = trace_openai(OpenAI(), client)
Environment variables: Make sure you have EVALAI_API_KEY and EVALAI_ORGANIZATION_ID in your .env file.
Tracing OpenAI Calls
Chat Completions
TypeScript
// All OpenAI calls are automatically traced!
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: [
{ role: 'system', content: 'You are a helpful assistant.' },
{ role: 'user', content: 'What is the capital of France?' }
],
temperature: 0.7
})
console.log(response.choices[0].message.content)
// Automatically tracked in EvalAI dashboard:
// ✓ Full prompt and response
// ✓ Token usage (input/output)
// ✓ Latency
// ✓ Model and parameters
// ✓ Cost estimation
Python
# All OpenAI calls are automatically traced!
response = await openai.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the capital of France?"}
],
temperature=0.7
)
print(response.choices[0].message.content)
# Automatically tracked in EvalAI dashboard:
# ✓ Full prompt and response
# ✓ Token usage (input/output)
# ✓ Latency
# ✓ Model and parameters
# ✓ Cost estimation
Streaming Responses
TypeScript
// Streaming is automatically traced too!
const stream = await openai.chat.completions.create({
model: 'gpt-4',
messages: messages,
stream: true
})
// Stream tokens to user
let fullResponse = ''
for await (const chunk of stream) {
const content = chunk.choices[0]?.delta?.content || ''
fullResponse += content
process.stdout.write(content)
}
// Full response is automatically captured in trace
Python
# Streaming is automatically traced too!
stream = await openai.chat.completions.create(
model="gpt-4",
messages=messages,
stream=True
)
# Stream tokens to user
full_response = ""
async for chunk in stream:
content = chunk.choices[0].delta.content or ""
full_response += content
print(content, end="", flush=True)
# Full response is automatically captured in trace
Function Calling
const tools = [
{
type: 'function',
function: {
name: 'get_weather',
description: 'Get current weather for a location',
parameters: {
type: 'object',
properties: {
location: { type: 'string' }
},
required: ['location']
}
}
}
]
const response = await openai.chat.completions.create({
model: 'gpt-4',
messages: messages,
tools: tools,
tool_choice: 'auto'
})
// Function calls are automatically tracked
Embeddings
// Embeddings are also automatically traced
const embedding = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: 'Your text here'
})
const vector = embedding.data[0].embedding
Advanced Patterns
Multi-Turn Conversations
import { traceWorkflowStep } from '@evalgate/sdk'
await tracer.startWorkflow('multi-turn-conversation', undefined, { sessionId: 'session_456' });
const messages = [];
// Turn 1
messages.push({ role: 'user', content: 'Hello!' });
const response1 = await traceWorkflowStep(tracer, 'turn-1', () =>
openai.chat.completions.create({ model: 'gpt-4', messages })
);
messages.push(response1.choices[0].message);
// Turn 2
messages.push({ role: 'user', content: 'Tell me a joke' });
const response2 = await traceWorkflowStep(tracer, 'turn-2', () =>
openai.chat.completions.create({ model: 'gpt-4', messages })
);
messages.push(response2.choices[0].message);
await tracer.endWorkflow({ status: 'success' });
Retry Logic with Tracing
async function callOpenAIWithRetry(messages, maxRetries = 3) {
await tracer.startWorkflow('openai-with-retry', undefined, { maxRetries });
for (let attempt = 1; attempt <= maxRetries; attempt++) {
try {
const result = await traceWorkflowStep(
tracer,
`attempt-${attempt}`,
() => openai.chat.completions.create({ model: 'gpt-4', messages })
);
await tracer.endWorkflow({ status: 'success' });
return result;
} catch (error) {
if (attempt === maxRetries) {
await tracer.endWorkflow({ status: 'failed', error: error.message });
throw error;
}
await new Promise(r => setTimeout(r, Math.pow(2, attempt) * 1000));
}
}
}
Parallel Requests
await tracer.startWorkflow('generate-variations', undefined, { count: 3 });
const prompts = [
'Write a formal email...',
'Write a casual email...',
'Write a brief email...'
];
const variations = await Promise.all(
prompts.map((prompt, i) =>
traceWorkflowStep(tracer, `variation-${i + 1}`, () =>
openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: prompt }]
})
)
)
);
await tracer.endWorkflow({ status: 'success' });
Evaluation Integration
Create Test Cases
const testCases = [
{
input: { prompt: 'Translate "hello" to Spanish' },
expectedOutput: { contains: 'hola' },
metadata: { category: 'translation' }
},
{
input: { prompt: 'What is 2+2?' },
expectedOutput: { exact: '4' },
metadata: { category: 'math' }
}
];
// Run evaluation
for (const testCase of testCases) {
await tracer.startWorkflow('test-case', undefined, testCase.metadata);
const result = await traceWorkflowStep(tracer, 'llm-call', () =>
openai.chat.completions.create({
model: 'gpt-4',
messages: [{ role: 'user', content: testCase.input.prompt }]
})
);
await tracer.endWorkflow({ status: 'success' });
const output = result.choices[0].message.content;
const passed = testCase.expectedOutput.contains
? output.toLowerCase().includes(testCase.expectedOutput.contains)
: output === testCase.expectedOutput.exact;
console.log(`Test ${testCase.metadata.category}: ${passed ? '✓' : '✗'}`);
}
A/B Testing Models
async function abTestModels(prompt) {
const variant = Math.random() < 0.5 ? 'gpt-4' : 'gpt-3.5-turbo';
await tracer.startWorkflow('model-ab-test', undefined, {
variant,
experimentId: 'gpt4-vs-gpt35'
});
const result = await traceWorkflowStep(tracer, 'llm-call', () =>
openai.chat.completions.create({
model: variant,
messages: [{ role: 'user', content: prompt }]
})
);
await tracer.endWorkflow({ status: 'success' });
return result;
}
// Analyze results in dashboard to compare:
// - Quality scores
// - Latency
// - Cost
// - User satisfaction
Best Practices
1. Add Contextual Metadata
await tracer.startWorkflow('content-generation', undefined, {
userId: user.id,
contentType: 'blog-post',
targetAudience: 'developers',
tone: 'professional',
model: 'gpt-4',
temperature: 0.7,
maxTokens: 2000
});
const span = await tracer.startAgentSpan('ContentAgent', { input: '...' });
// OpenAI call
await tracer.endAgentSpan(span, { result: '...' });
await tracer.endWorkflow({ status: 'success' });
2. Track Token Usage
Automatically tracked in every trace:
- Input tokens
- Output tokens
- Total cost (calculated from pricing)
3. Monitor for Errors
try {
await tracer.startWorkflow('api-call');
const result = await traceWorkflowStep(tracer, 'llm-call', () =>
openai.chat.completions.create({...})
);
await tracer.endWorkflow({ status: 'success' });
return result;
} catch (error) {
await tracer.endWorkflow({ status: 'failed', error: error.message });
console.error('OpenAI error:', error.message);
if (error.status === 429) {
// Rate limit hit
} else if (error.status === 500) {
// OpenAI service error
}
}
4. Set Timeouts
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
timeout: 30000, // 30 second timeout
maxRetries: 2
});
// Timeout tracked in trace automatically
Troubleshooting
Traces not capturing token usage?
Ensure you're using the latest version of the SDK.
High latency in traces?
Check if you're using synchronous operations. Use async/await consistently.
Missing streaming response data?
The SDK automatically buffers streaming responses for complete trace capture.
Real-World Example
Content Moderation System
Setup: GPT-4 API for content safety classification
Tracing: All API calls traced with content metadata
Evaluation: 500 test cases with known safe/unsafe content
Results:
- 98.5% accuracy on test suite
- Average latency: 850ms
- Caught regression when model was updated
- Monthly cost: $420 (tracked via traces)