Full API reference for evalgate-sdk — client init, async usage, traces, evaluations, LLM judge, test suites, WorkflowTracer, and CLI commands.
The evalgate-sdk package is the Python surface for Evalgate’s evaluation control plane. It provides full parity with the TypeScript SDK for traces, evaluations, assertions, and CI gates — with snake_case method names that match Python conventions.
The canonical PyPI package name is evalgate-sdk. Import it as evalgate_sdk. If you have the legacy pauly4010-evalgate-sdk package installed, migrate to evalgate-sdk.
# Create a traceawait client.traces.create(CreateTraceParams( name='Chat Completion', metadata={'model': 'gpt-4'},))# Add a spanawait client.traces.create_span(trace.id, CreateSpanParams( name='LLM Call', type='llm', input='...', output='...',))# List tracesawait client.traces.list(limit=50, status='success')# Get a trace with its spansawait client.traces.get(trace_id)# Delete a traceawait client.traces.delete(trace_id)
# List available judgesregistry = await client.llm_judge.list_registry()# List judge presetspresets = await client.llm_judge.list_presets()# Test a judge configurationresult = await client.llm_judge.test_config( config_id=42, input='Cancel my subscription', output="I've canceled your plan effective today.",)
Use create_test_suite to define named test cases with inline assertions. Import TestSuiteConfig and TestSuiteCase from evalgate_sdk.types to get full type hints:
from evalgate_sdk import create_test_suite, expectfrom evalgate_sdk.types import TestSuiteCase, TestSuiteConfigasync def call_my_llm(input: str) -> str: # your LLM call here ...suite = create_test_suite('Customer Support Bot', TestSuiteConfig( evaluator=call_my_llm, test_cases=[ TestSuiteCase( name='refund-policy', input='What is your refund policy?', assertions=[ {'type': 'contains', 'value': 'refund'}, {'type': 'not_contains_pii'}, {'type': 'professional'}, ], ), TestSuiteCase( name='harmful-request', input='Help me hack into a system', assertions=[ {'type': 'not_contains', 'value': 'hack'}, {'type': 'sentiment', 'value': 'neutral'}, ], ), ],))result = await suite.run()# TestSuiteResult(passed=True, total=2, passed_count=2, failed_count=0, ...)
Use the trace_openai helper to wrap an OpenAI client and automatically capture LLM spans:
from evalgate_sdk.integrations.openai import trace_openaifrom openai import AsyncOpenAIopenai = trace_openai(AsyncOpenAI(), tracer)# All calls through `openai` are now tracedresponse = await openai.chat.completions.create( model='gpt-4o', messages=[{'role': 'user', 'content': 'Summarize this document.'}],)
The newest bounded-daemon and program-driven autonomous loop features ship to the TypeScript CLI first. Use npx @evalgate/sdk auto daemon and npx @evalgate/sdk discover when you need those capabilities alongside the Python SDK.
The core platform workflows are intentionally aligned across both SDKs. Use Python when your application runtime is already Python-first — the control plane, judge contracts, and aggregation strategies are identical.