Why Tracing Matters
Distributed tracing gives you full visibility into your AI application's behavior in production. Track every LLM call, measure latency, monitor token usage, and debug issues before they impact users.
Installation
Install the EvalGate SDK in your project:
TypeScript
npm install @evalgate/sdk
Or with other package managers:
yarn add @evalgate/sdk
pnpm add @evalgate/sdk
Python
pip install pauly4010-evalgate-sdk
Environment Setup
Create a .env file in your project root:
EVALAI_API_KEY=sk_test_your_api_key_here
EVALAI_ORGANIZATION_ID=your_org_id_here
Get your API key from the Developer Dashboard.
Basic Setup
Initialize the SDK client and tracer:
TypeScript
import { AIEvalClient, WorkflowTracer } from '@evalgate/sdk'
const client = new AIEvalClient({ apiKey: process.env.EVALAI_API_KEY })
const tracer = new WorkflowTracer(client)
Python
from evalgate_sdk import AIEvalClient, WorkflowTracer
client = AIEvalClient(api_key=os.environ["EVALAI_API_KEY"])
tracer = WorkflowTracer(client)
Creating Traces
Create traces to track LLM calls and operations:
TypeScript
// Create a trace
const trace = await client.traces.create({
name: 'Customer Support Query',
traceId: 'trace-' + Date.now(),
metadata: {
userId: 'user_123',
sessionId: 'session_456'
}
})
// Add spans to track specific operations
const span = await client.traces.createSpan(trace.id, {
name: 'LLM Call',
spanId: 'span-' + Date.now(),
type: 'llm',
startTime: new Date().toISOString(),
input: userQuery,
output: response,
metadata: { model: 'gpt-4', tokens: 150 }
})
Python
from evalgate_sdk.types import CreateTraceParams, CreateSpanParams
# Create a trace
trace = await client.traces.create(CreateTraceParams(
name="Customer Support Query",
trace_id=f"trace-{int(time.time() * 1000)}",
metadata={"userId": "user_123", "sessionId": "session_456"}
))
# Add spans to track specific operations
span = await client.traces.create_span(trace.id, CreateSpanParams(
name="LLM Call",
span_id=f"span-{int(time.time() * 1000)}",
type="llm",
start_time=datetime.now().isoformat(),
input=user_query,
output=response,
metadata={"model": "gpt-4", "tokens": 150}
))
What Gets Tracked
Each trace automatically captures:
- Input/Output: Full prompts and responses
- Timing: Start time, duration, latency
- Tokens: Input tokens, output tokens, total cost
- Model: Model name, version, parameters
- Metadata: User ID, session ID, custom tags
- Errors: Stack traces and error messages
Nested Traces (Spans)
For complex workflows with multiple LLM calls, use nested spans:
TypeScript
import { traceWorkflowStep } from '@evalgate/sdk'
await tracer.startWorkflow('RAG Pipeline');
const embedding = await traceWorkflowStep(tracer, 'embed-query', async () => {
return await openai.embeddings.create({...});
});
const docs = await traceWorkflowStep(tracer, 'retrieve-docs', async () => {
return await vectorDb.search(embedding);
});
const response = await traceWorkflowStep(tracer, 'generate-response', async () => {
return await openai.chat.completions.create({...});
});
await tracer.endWorkflow({ status: 'success' });
Python
from evalgate_sdk.workflows import trace_workflow_step
await tracer.start_workflow("RAG Pipeline")
embedding = await trace_workflow_step(tracer, "embed-query",
lambda: openai.embeddings.create(...)
)
docs = await trace_workflow_step(tracer, "retrieve-docs",
lambda: vector_db.search(embedding)
)
response = await trace_workflow_step(tracer, "generate-response",
lambda: openai.chat.completions.create(...)
)
await tracer.end_workflow({"status": "success"})
Adding Custom Metadata
Enrich traces with business context:
TypeScript
await tracer.startWorkflow('content-generation', undefined, {
userId: user.id,
contentType: 'blog-post',
targetAudience: 'developers',
keywords: ['AI', 'evaluation', 'testing']
});
const span = await tracer.startAgentSpan('ContentAgent', { input: '...' });
// Your LLM call here
await tracer.endAgentSpan(span, { result: '...' });
await tracer.endWorkflow({ status: 'success' });
Python
await tracer.start_workflow("content-generation", metadata={
"userId": user.id,
"contentType": "blog-post",
"targetAudience": "developers",
"keywords": ["AI", "evaluation", "testing"]
})
span = await tracer.start_agent_span("ContentAgent", {"input": "..."})
# Your LLM call here
await tracer.end_agent_span(span, {"result": "..."})
await tracer.end_workflow({"status": "success"})
Viewing Traces
Once instrumented, visit the Traces page in your dashboard to:
- Search and filter traces by metadata, tags, or time range
- View detailed timelines showing nested spans
- Analyze token usage and costs
- Debug failures with full stack traces
- Identify performance bottlenecks
Integration with Evaluations
Traces can be used as test cases for evaluations. Convert real production traces into regression tests to ensure your AI system maintains quality over time.
Best Practices
- Use descriptive names for traces (e.g., "customer-support-query" not "llm-call")
- Add relevant metadata (user ID, session ID, feature flags)
- Tag traces with environment (production, staging, development)
- Set up sampling for high-volume applications (trace 10% of requests)
- Use nested spans for complex multi-step workflows
- Never log sensitive PII without proper anonymization
Troubleshooting
Traces not appearing?
Check that your API key is correct and the SDK is properly initialized.
High latency overhead?
The SDK adds minimal overhead (~10ms), but ensure you're not blocking on trace uploads.
Missing span data?
Make sure async functions are properly awaited within trace callbacks.