Instrument your LLM app with Evalgate tracing

Add distributed tracing to your LLM app to track every call, measure latency, monitor token usage, and debug failures before they affect users.

Distributed tracing gives you full visibility into your AI application’s behavior in production. Every LLM call, retrieval step, and tool invocation becomes a searchable, filterable event with timing, token counts, and cost data attached. This guide walks you through installing the SDK, creating traces and spans, nesting spans for multi-step workflows, and following tracing best practices.

Install the SDK

npm install @evalgate/sdk
# or
yarn add @evalgate/sdk
# or
pnpm add @evalgate/sdk

Set environment variables

Create a .env file in your project root and add your credentials:

.env

EVALGATE_API_KEY=sk_test_your_api_key_here
EVALGATE_ORGANIZATION_ID=your_org_id_here

Get your API key from the Developer Dashboard.

Initialize the client and tracer

import { AIEvalClient, WorkflowTracer } from '@evalgate/sdk'

const client = new AIEvalClient({
  apiKey: process.env.EVALGATE_API_KEY
})

const tracer = new WorkflowTracer(client)

Create traces

A trace represents one logical operation — a user query, a support ticket, or a content generation request. Create a trace with a descriptive name and attach metadata that will help you filter it later:

const trace = await client.traces.create({
  name: 'Customer Support Query',
  traceId: 'trace-' + Date.now(),
  metadata: {
    userId: 'user_123',
    sessionId: 'session_456'
  }
})

Add spans

Spans represent individual steps within a trace — an LLM call, a vector search, or a function execution. Attach each span to its parent trace:

const span = await client.traces.createSpan(trace.id, {
  name: 'LLM Call',
  spanId: 'span-' + Date.now(),
  type: 'llm',
  startTime: new Date().toISOString(),
  input: userQuery,
  output: response,
  metadata: {
    model: 'gpt-5.2-chat-latest',
    tokens: 150
  }
})

Nested spans for multi-step workflows

For pipelines with multiple sequential steps — like a RAG workflow with embedding, retrieval, and generation — use traceWorkflowStep to create properly nested spans automatically:

import { traceWorkflowStep } from '@evalgate/sdk'

await tracer.startWorkflow('RAG Pipeline')

const embedding = await traceWorkflowStep(tracer, 'embed-query', async () => {
  return await openai.embeddings.create({ /* ... */ })
})

const docs = await traceWorkflowStep(tracer, 'retrieve-docs', async () => {
  return await vectorDb.search(embedding)
})

const response = await traceWorkflowStep(tracer, 'generate-response', async () => {
  return await openai.chat.completions.create({ /* ... */ })
})

await tracer.endWorkflow({ status: 'success' })

Adding custom metadata

Attach business context to traces to make them filterable and useful for debugging:

await tracer.startWorkflow('content-generation', undefined, {
  userId: user.id,
  contentType: 'blog-post',
  targetAudience: 'developers',
  keywords: ['AI', 'evaluation', 'testing']
})

const span = await tracer.startAgentSpan('ContentAgent', { input: '...' })
// Your LLM call here
await tracer.endAgentSpan(span, { result: '...' })

await tracer.endWorkflow({ status: 'success' })

What gets tracked automatically

Every trace and span captures the following without any extra code:

Field	Description
Input / Output	Full prompts and model responses
Timing	Start time, end time, and total latency
Tokens	Input tokens, output tokens, and estimated cost
Model	Model name, version, and parameters
Metadata	User ID, session ID, and any custom tags you attach
Errors	Stack traces and error messages on failure

Viewing traces

Once your application is instrumented, open the Traces page in your dashboard to:

Search and filter traces by metadata, tags, or time range
View detailed timelines showing nested spans
Analyze token usage and costs across requests
Debug failures with full stack traces
Identify latency bottlenecks across pipeline steps

Best practices

Use descriptive names

Name traces after the user action, not the implementation. customer-support-query is more useful than llm-call.

Attach relevant metadata

Include userId, sessionId, environment, and feature flags so you can slice and debug traces effectively.

Sample for high volume

For high-throughput applications, configure sampling to trace 10–20% of requests rather than every call.

Never log PII

Anonymize or redact sensitive user data before it appears in trace inputs, outputs, or metadata fields.

Troubleshooting

Traces not appearing in the dashboard? Verify your EVALGATE_API_KEY is correct and that AIEvalClient is initialized before any traces are created. Noticing added latency? The SDK adds roughly 10ms of overhead. Make sure you are not await-ing trace upload calls in the critical path — they run asynchronously by default. Spans missing data? Ensure every async function inside a traceWorkflowStep callback is properly await-ed. Unawaited promises can resolve after the span closes.

Get Started

Core Concepts

Guides

SDK Reference

Platform

Tracing setup

Instrument your LLM app with Evalgate tracing

Install the SDK

Set environment variables

Initialize the client and tracer

Create traces

Add spans

Nested spans for multi-step workflows

Adding custom metadata

What gets tracked automatically

Viewing traces

Best practices

Use descriptive names

Attach relevant metadata

Sample for high volume

Never log PII

Troubleshooting

Get Started

Core Concepts

Guides

SDK Reference

Platform

Documentation Index

​Instrument your LLM app with Evalgate tracing

​Install the SDK

​Set environment variables

​Initialize the client and tracer

​Create traces

​Add spans

​Nested spans for multi-step workflows

​Adding custom metadata

​What gets tracked automatically

​Viewing traces

​Best practices

Use descriptive names

Attach relevant metadata

Sample for high volume

Never log PII

​Troubleshooting

Instrument your LLM app with Evalgate tracing

Install the SDK

Set environment variables

Initialize the client and tracer

Create traces

Add spans

Nested spans for multi-step workflows

Adding custom metadata

What gets tracked automatically

Viewing traces

Best practices

Troubleshooting