Documentation Index
Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Traces: visibility into LLM behavior
Learn what traces and spans are, what data they capture, how sampling works, and how to promote real failures into evaluation test cases.A trace is a structured record of one complete run of your AI system — everything that happened from the moment a request arrived to the moment a response was returned. Traces give you the ground truth you need to understand what your AI is actually doing in production, not what you expect it to do in tests.
Traces and spans
A trace represents the whole workflow. A span represents one individual step inside that workflow — a single LLM call, a tool invocation, a retrieval operation, or any other discrete unit of work. A simple chatbot request might produce one trace with one span. A RAG pipeline might produce one trace with four spans: embed the query, retrieve documents, re-rank results, and generate the response. The trace holds the end-to-end picture; spans let you isolate latency, cost, and correctness at each step.What gets captured
Every trace and span records:| Field | What it contains |
|---|---|
| Input | The full prompt or query sent to the model |
| Output | The full response returned |
| Tokens | Input token count, output token count |
| Latency | Duration in milliseconds |
| Cost | Estimated cost based on model pricing |
| Model | Model name, version, and provider |
| Metadata | User ID, session ID, feature flags, custom tags |
| Errors | Stack traces and error messages when calls fail |
Instrumenting your application
Use the SDK to create traces and attach spans wherever your application calls an LLM or performs a step you want to observe.Multi-step workflows
For pipelines with multiple LLM calls or tool steps, attach one span per step to the same trace. This lets you see the full timeline and pinpoint exactly where latency or quality problems occur.Asymmetric sampling
For high-volume applications, tracing every request is expensive and unnecessary. Evalgate uses asymmetric sampling by default:- 10% of successful requests are sampled
- 100% of errors are always captured
Enriching traces with metadata
Traces become much more useful when they carry business context alongside the model inputs and outputs. Attach metadata at trace creation time to enable filtering, grouping, and alerting in the dashboard.Viewing traces in the dashboard
Once your application is instrumented, open the Traces page in your Evalgate dashboard to:- Search and filter traces by metadata, tags, model, or time range
- View detailed timelines showing nested spans and their durations
- Analyze token usage and cost breakdowns per step
- Inspect full input and output text for any span
- Debug failures with complete error stack traces
- Identify performance bottlenecks across the workflow