Add distributed tracing and evaluations to LangChain chains, agents, and RAG pipelines to monitor quality and catch regressions in production.
LangChain makes it easy to build complex LLM pipelines, but that complexity introduces more failure points — a broken tool, a retrieval miss, or a degraded prompt can silently reduce quality across thousands of requests. Wrapping your LangChain components with Evalgate tracing gives you end-to-end visibility into every step and lets you run structured evaluations against known-good baselines. This guide covers setup, tracing common LangChain patterns, running evaluations against chains, and monitoring production workflows.
After collecting production traces, use the CLI to label them interactively and build evaluation coverage from real failures:
# Label unlabeled traces one by onenpx evalgate label# See failure-mode frequency across labeled tracesnpx evalgate analyze
Sample traces for high-throughput applications — trace 10% of requests to keep overhead low while retaining full error visibility. Evalgate samples 100% of error traces by default.
Use descriptive span names like embed-query and retrieve-docs instead of generic names like step-1. Specific names make timeline debugging much faster.
Attach relevant metadata
Include userId, sessionId, and model version in workflow metadata so you can filter traces by user segment or model version in the dashboard.
Test at each layer
Test retrieval, generation, and end-to-end quality separately. A passing end-to-end score can mask a broken retrieval step.
Promote failures to tests
When a production chain produces a bad output, capture that input as a test case in your eval suite so the same failure cannot recur.
Traces not appearing in the dashboard?Confirm the SDK is initialized with the correct EVALGATE_API_KEY and that WorkflowTracer is instantiated before any workflow calls.Spans are missing or out of order?Make sure every async call inside a traceWorkflowStep callback is properly await-ed. Unawaited promises can resolve after the span closes, causing incomplete data.High latency overhead?The SDK adds roughly 10ms of overhead per trace upload. Use enableBatching: true when initializing the client to group writes into fewer API calls.