Skip to main content

Documentation Index

Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Instrument your LLM app with Evalgate tracing

Add distributed tracing to your LLM app to track every call, measure latency, monitor token usage, and debug failures before they affect users.
Distributed tracing gives you full visibility into your AI application’s behavior in production. Every LLM call, retrieval step, and tool invocation becomes a searchable, filterable event with timing, token counts, and cost data attached. This guide walks you through installing the SDK, creating traces and spans, nesting spans for multi-step workflows, and following tracing best practices.

Install the SDK

npm install @evalgate/sdk
# or
yarn add @evalgate/sdk
# or
pnpm add @evalgate/sdk

Set environment variables

Create a .env file in your project root and add your credentials:
.env
EVALGATE_API_KEY=sk_test_your_api_key_here
EVALGATE_ORGANIZATION_ID=your_org_id_here
Get your API key from the Developer Dashboard.

Initialize the client and tracer

import { AIEvalClient, WorkflowTracer } from '@evalgate/sdk'

const client = new AIEvalClient({
  apiKey: process.env.EVALGATE_API_KEY
})

const tracer = new WorkflowTracer(client)

Create traces

A trace represents one logical operation — a user query, a support ticket, or a content generation request. Create a trace with a descriptive name and attach metadata that will help you filter it later:
const trace = await client.traces.create({
  name: 'Customer Support Query',
  traceId: 'trace-' + Date.now(),
  metadata: {
    userId: 'user_123',
    sessionId: 'session_456'
  }
})

Add spans

Spans represent individual steps within a trace — an LLM call, a vector search, or a function execution. Attach each span to its parent trace:
const span = await client.traces.createSpan(trace.id, {
  name: 'LLM Call',
  spanId: 'span-' + Date.now(),
  type: 'llm',
  startTime: new Date().toISOString(),
  input: userQuery,
  output: response,
  metadata: {
    model: 'gpt-5.2-chat-latest',
    tokens: 150
  }
})

Nested spans for multi-step workflows

For pipelines with multiple sequential steps — like a RAG workflow with embedding, retrieval, and generation — use traceWorkflowStep to create properly nested spans automatically:
import { traceWorkflowStep } from '@evalgate/sdk'

await tracer.startWorkflow('RAG Pipeline')

const embedding = await traceWorkflowStep(tracer, 'embed-query', async () => {
  return await openai.embeddings.create({ /* ... */ })
})

const docs = await traceWorkflowStep(tracer, 'retrieve-docs', async () => {
  return await vectorDb.search(embedding)
})

const response = await traceWorkflowStep(tracer, 'generate-response', async () => {
  return await openai.chat.completions.create({ /* ... */ })
})

await tracer.endWorkflow({ status: 'success' })

Adding custom metadata

Attach business context to traces to make them filterable and useful for debugging:
await tracer.startWorkflow('content-generation', undefined, {
  userId: user.id,
  contentType: 'blog-post',
  targetAudience: 'developers',
  keywords: ['AI', 'evaluation', 'testing']
})

const span = await tracer.startAgentSpan('ContentAgent', { input: '...' })
// Your LLM call here
await tracer.endAgentSpan(span, { result: '...' })

await tracer.endWorkflow({ status: 'success' })

What gets tracked automatically

Every trace and span captures the following without any extra code:
FieldDescription
Input / OutputFull prompts and model responses
TimingStart time, end time, and total latency
TokensInput tokens, output tokens, and estimated cost
ModelModel name, version, and parameters
MetadataUser ID, session ID, and any custom tags you attach
ErrorsStack traces and error messages on failure

Viewing traces

Once your application is instrumented, open the Traces page in your dashboard to:
  • Search and filter traces by metadata, tags, or time range
  • View detailed timelines showing nested spans
  • Analyze token usage and costs across requests
  • Debug failures with full stack traces
  • Identify latency bottlenecks across pipeline steps

Best practices

Use descriptive names

Name traces after the user action, not the implementation. customer-support-query is more useful than llm-call.

Attach relevant metadata

Include userId, sessionId, environment, and feature flags so you can slice and debug traces effectively.

Sample for high volume

For high-throughput applications, configure sampling to trace 10–20% of requests rather than every call.

Never log PII

Anonymize or redact sensitive user data before it appears in trace inputs, outputs, or metadata fields.

Troubleshooting

Traces not appearing in the dashboard? Verify your EVALGATE_API_KEY is correct and that AIEvalClient is initialized before any traces are created. Noticing added latency? The SDK adds roughly 10ms of overhead. Make sure you are not await-ing trace upload calls in the critical path — they run asynchronously by default. Spans missing data? Ensure every async function inside a traceWorkflowStep callback is properly await-ed. Unawaited promises can resolve after the span closes.