Documentation Index
Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt
Use this file to discover all available pages before exploring further.
Use Evalgate tools via MCP in AI agents
Connect Cursor, Claude Desktop, or any MCP-compatible agent to Evalgate for live tool access to evaluations, quality scores, and traces.Evalgate exposes an MCP-style tool discovery and execution API so AI agents can call platform services directly — without leaving the IDE or agent context. Any MCP-compatible client can discover available tools and execute them against your Evalgate workspace using your API key.
Supported clients
- Cursor IDE — use evaluations and quality scores without leaving your editor
- Claude Desktop — run evaluations and retrieve results from the chat interface
- ChatGPT Plugins — integrate Evalgate tools into ChatGPT workflows
- Custom MCP clients — any client that speaks the MCP tool discovery and execution protocol
API endpoints
| Method | Endpoint | Auth | Description |
|---|---|---|---|
GET | /api/mcp/tools | None | List available tools and input schemas |
POST | /api/mcp/call | Required | Execute a tool |
Authentication
Authenticated requests require an API key passed as a bearer token:The
GET /api/mcp/tools endpoint is public and requires no authentication. Only tool execution via POST /api/mcp/call requires an API key.Discover available tools
Call the tool discovery endpoint to retrieve all available tools and their input schemas.Execute a tool
Pass the tool name and arguments to/api/mcp/call. Always include your API key.
Server-side tools
These tools are available via/api/mcp/call from any external MCP client.
eval.quality.latest — get latest quality score
eval.quality.latest — get latest quality score
Retrieves the latest quality score for an evaluation, with optional baseline comparison.
| Parameter | Type | Required | Description |
|---|---|---|---|
evaluationId | number | Yes | ID of the evaluation |
baseline | string | No | One of published, previous, or production |
eval.run.create — create a new evaluation run
eval.run.create — create a new evaluation run
Creates a new evaluation run for a given evaluation, optionally targeting a specific environment.
| Parameter | Type | Required | Description |
|---|---|---|---|
evaluationId | number | Yes | ID of the evaluation to run |
environment | string | No | Target environment (e.g. production, staging) |
eval.trace.create — create a distributed trace
eval.trace.create — create a distributed trace
Creates a new trace in Evalgate for recording model or agent behavior.
| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Name for the trace |
metadata | object | No | Arbitrary key-value metadata to attach |
eval.testcase.list — list test cases for an evaluation
eval.testcase.list — list test cases for an evaluation
Lists test cases attached to a given evaluation.
| Parameter | Type | Required | Description |
|---|---|---|---|
evaluationId | number | Yes | ID of the evaluation |
limit | number | No | Maximum number of test cases to return |
MCP tool execution uses a dedicated rate limit tier: 100 requests per minute per IP or API key. This is separate from the standard REST API rate limit.
WebMCP tools (browser)
When the Evalgate dashboard is open in your browser, six additional tools are registered vianavigator.modelContext. These are available to AI assistants running in the browser context — Cursor, Claude, and ChatGPT can call them when the dashboard tab is active.
list_evaluation_templates — browse template library
list_evaluation_templates — browse template library
Lists evaluation templates across 17 categories including
unit_tests, adversarial, human_eval, llm_judge, chatbot, rag, code-gen, and more.| Parameter | Type | Required | Description |
|---|---|---|---|
category | string | No | Filter by category (e.g. "rag", "adversarial") |
limit | number | No | Maximum number of templates to return |
get_evaluation_test_cases — retrieve test cases by evaluation ID
get_evaluation_test_cases — retrieve test cases by evaluation ID
Retrieves test cases for a specific evaluation.
| Parameter | Type | Required | Description |
|---|---|---|---|
evaluationId | number | Yes | ID of the evaluation |
limit | number | No | Maximum number of test cases to return |
create_evaluation — create a new evaluation
create_evaluation — create a new evaluation
Creates a new evaluation. Supported types:
unit_test, human_eval, model_eval, ab_test. Optionally include test cases inline.| Parameter | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Display name for the evaluation |
type | string | Yes | One of unit_test, human_eval, model_eval, ab_test |
description | string | No | Optional description |
testCases | array | No | Inline test cases to attach on creation |
run_evaluation — execute an evaluation run
run_evaluation — execute an evaluation run
Fetches the test cases for an evaluation, runs them, and computes pass/fail scores.
| Parameter | Type | Required | Description |
|---|---|---|---|
evaluationId | number | Yes | ID of the evaluation to run |
get_evaluation_results — retrieve run results
get_evaluation_results — retrieve run results
Returns pass/fail counts, per-test-case results, scores, and run status for an evaluation.
| Parameter | Type | Required | Description |
|---|---|---|---|
evaluationId | number | Yes | ID of the evaluation |
limit | number | No | Number of results to return (default: 10) |
get_quality_score — get the latest quality score
get_quality_score — get the latest quality score
Returns the quality score from the most recent evaluation run: name, status, total/passed/failed counts, and pass rate.
| Parameter | Type | Required | Description |
|---|---|---|---|
evaluationId | number | Yes | ID of the evaluation |
Set up a client
Configure your MCP client to point at the Evalgate tool discovery endpoint and authenticate with your API key.- Cursor IDE
- Claude Desktop
Add the following to your Cursor MCP server configuration: