EvalGate

Why Tracing Matters

Distributed tracing gives you full visibility into your AI application's behavior in production. Track every LLM call, measure latency, monitor token usage, and debug issues before they impact users.

Installation

Install the EvalGate SDK in your project:

TypeScript

npm install @evalgate/sdk

Or with other package managers:

yarn add @evalgate/sdk
pnpm add @evalgate/sdk

Python

pip install pauly4010-evalgate-sdk

Environment Setup

Create a .env file in your project root:

EVALAI_API_KEY=sk_test_your_api_key_here EVALAI_ORGANIZATION_ID=your_org_id_here

Get your API key from the Developer Dashboard.

Basic Setup

Initialize the SDK client and tracer:

TypeScript

import { AIEvalClient, WorkflowTracer } from '@evalgate/sdk' const client = new AIEvalClient({ apiKey: process.env.EVALAI_API_KEY }) const tracer = new WorkflowTracer(client)

Python

from evalgate_sdk import AIEvalClient, WorkflowTracer client = AIEvalClient(api_key=os.environ["EVALAI_API_KEY"]) tracer = WorkflowTracer(client)

Creating Traces

Create traces to track LLM calls and operations:

TypeScript

// Create a trace const trace = await client.traces.create({ name: 'Customer Support Query', traceId: 'trace-' + Date.now(), metadata: { userId: 'user_123', sessionId: 'session_456' } }) // Add spans to track specific operations const span = await client.traces.createSpan(trace.id, { name: 'LLM Call', spanId: 'span-' + Date.now(), type: 'llm', startTime: new Date().toISOString(), input: userQuery, output: response, metadata: { model: 'gpt-4', tokens: 150 } })

Python

from evalgate_sdk.types import CreateTraceParams, CreateSpanParams # Create a trace trace = await client.traces.create(CreateTraceParams( name="Customer Support Query", trace_id=f"trace-{int(time.time() * 1000)}", metadata={"userId": "user_123", "sessionId": "session_456"} )) # Add spans to track specific operations span = await client.traces.create_span(trace.id, CreateSpanParams( name="LLM Call", span_id=f"span-{int(time.time() * 1000)}", type="llm", start_time=datetime.now().isoformat(), input=user_query, output=response, metadata={"model": "gpt-4", "tokens": 150} ))

What Gets Tracked

Each trace automatically captures:

Input/Output: Full prompts and responses
Timing: Start time, duration, latency
Tokens: Input tokens, output tokens, total cost
Model: Model name, version, parameters
Metadata: User ID, session ID, custom tags
Errors: Stack traces and error messages

Nested Traces (Spans)

For complex workflows with multiple LLM calls, use nested spans:

TypeScript

import { traceWorkflowStep } from '@evalgate/sdk' await tracer.startWorkflow('RAG Pipeline'); const embedding = await traceWorkflowStep(tracer, 'embed-query', async () => { return await openai.embeddings.create({...}); }); const docs = await traceWorkflowStep(tracer, 'retrieve-docs', async () => { return await vectorDb.search(embedding); }); const response = await traceWorkflowStep(tracer, 'generate-response', async () => { return await openai.chat.completions.create({...}); }); await tracer.endWorkflow({ status: 'success' });

Python

from evalgate_sdk.workflows import trace_workflow_step await tracer.start_workflow("RAG Pipeline") embedding = await trace_workflow_step(tracer, "embed-query", lambda: openai.embeddings.create(...) ) docs = await trace_workflow_step(tracer, "retrieve-docs", lambda: vector_db.search(embedding) ) response = await trace_workflow_step(tracer, "generate-response", lambda: openai.chat.completions.create(...) ) await tracer.end_workflow({"status": "success"})

Adding Custom Metadata

Enrich traces with business context:

TypeScript

await tracer.startWorkflow('content-generation', undefined, { userId: user.id, contentType: 'blog-post', targetAudience: 'developers', keywords: ['AI', 'evaluation', 'testing'] }); const span = await tracer.startAgentSpan('ContentAgent', { input: '...' }); // Your LLM call here await tracer.endAgentSpan(span, { result: '...' }); await tracer.endWorkflow({ status: 'success' });

Python

await tracer.start_workflow("content-generation", metadata={ "userId": user.id, "contentType": "blog-post", "targetAudience": "developers", "keywords": ["AI", "evaluation", "testing"] }) span = await tracer.start_agent_span("ContentAgent", {"input": "..."}) # Your LLM call here await tracer.end_agent_span(span, {"result": "..."}) await tracer.end_workflow({"status": "success"})

Viewing Traces

Once instrumented, visit the Traces page in your dashboard to:

Search and filter traces by metadata, tags, or time range
View detailed timelines showing nested spans
Analyze token usage and costs
Debug failures with full stack traces
Identify performance bottlenecks

Integration with Evaluations

Traces can be used as test cases for evaluations. Convert real production traces into regression tests to ensure your AI system maintains quality over time.

Best Practices

Use descriptive names for traces (e.g., "customer-support-query" not "llm-call")
Add relevant metadata (user ID, session ID, feature flags)
Tag traces with environment (production, staging, development)
Set up sampling for high-volume applications (trace 10% of requests)
Use nested spans for complex multi-step workflows
Never log sensitive PII without proper anonymization

Troubleshooting

Traces not appearing?

Check that your API key is correct and the SDK is properly initialized.

High latency overhead?

The SDK adds minimal overhead (~10ms), but ensure you're not blocking on trace uploads.

Missing span data?

Make sure async functions are properly awaited within trace callbacks.

Why Tracing Matters

Distributed tracing gives you full visibility into your AI application's behavior in production. Track every LLM call, measure latency, monitor token usage, and debug issues before they impact users.

Installation

Install the EvalGate SDK in your project:

TypeScript

npm install @evalgate/sdk

Or with other package managers:

yarn add @evalgate/sdk
pnpm add @evalgate/sdk

Python

pip install pauly4010-evalgate-sdk

Environment Setup

Create a .env file in your project root:

EVALAI_API_KEY=sk_test_your_api_key_here EVALAI_ORGANIZATION_ID=your_org_id_here

Get your API key from the Developer Dashboard.

Basic Setup

Initialize the SDK client and tracer:

TypeScript

import { AIEvalClient, WorkflowTracer } from '@evalgate/sdk' const client = new AIEvalClient({ apiKey: process.env.EVALAI_API_KEY }) const tracer = new WorkflowTracer(client)

Python

from evalgate_sdk import AIEvalClient, WorkflowTracer client = AIEvalClient(api_key=os.environ["EVALAI_API_KEY"]) tracer = WorkflowTracer(client)

Creating Traces

Create traces to track LLM calls and operations:

TypeScript

Python

What Gets Tracked

Each trace automatically captures:

Input/Output: Full prompts and responses
Timing: Start time, duration, latency
Tokens: Input tokens, output tokens, total cost
Model: Model name, version, parameters
Metadata: User ID, session ID, custom tags
Errors: Stack traces and error messages

Nested Traces (Spans)

For complex workflows with multiple LLM calls, use nested spans:

TypeScript

Python

Adding Custom Metadata

Enrich traces with business context:

TypeScript

Python

Viewing Traces

Once instrumented, visit the Traces page in your dashboard to:

Search and filter traces by metadata, tags, or time range
View detailed timelines showing nested spans
Analyze token usage and costs
Debug failures with full stack traces
Identify performance bottlenecks

Integration with Evaluations

Traces can be used as test cases for evaluations. Convert real production traces into regression tests to ensure your AI system maintains quality over time.

Best Practices

Use descriptive names for traces (e.g., "customer-support-query" not "llm-call")
Add relevant metadata (user ID, session ID, feature flags)
Tag traces with environment (production, staging, development)
Set up sampling for high-volume applications (trace 10% of requests)
Use nested spans for complex multi-step workflows
Never log sensitive PII without proper anonymization

Troubleshooting

Traces not appearing?

Check that your API key is correct and the SDK is properly initialized.

High latency overhead?

The SDK adds minimal overhead (~10ms), but ensure you're not blocking on trace uploads.

Missing span data?

Make sure async functions are properly awaited within trace callbacks.

EvalGate

Tracing Setup Guide

Why Tracing Matters

Installation

Environment Setup

Basic Setup

Creating Traces

What Gets Tracked

Nested Traces (Spans)

Adding Custom Metadata

Viewing Traces

Integration with Evaluations

Best Practices

Troubleshooting

Related Guides

EvalGate

Tracing Setup Guide

Why Tracing Matters

Installation

Environment Setup

Basic Setup

Creating Traces

What Gets Tracked

Nested Traces (Spans)

Adding Custom Metadata

Viewing Traces

Integration with Evaluations

Best Practices

Troubleshooting

Related Guides