Use Evalgate tools via MCP in AI agents

Connect Cursor, Claude Desktop, or any MCP-compatible agent to Evalgate for live tool access to evaluations, quality scores, and traces.

Evalgate exposes an MCP-style tool discovery and execution API so AI agents can call platform services directly — without leaving the IDE or agent context. Any MCP-compatible client can discover available tools and execute them against your Evalgate workspace using your API key.

Supported clients

Cursor IDE — use evaluations and quality scores without leaving your editor
Claude Desktop — run evaluations and retrieve results from the chat interface
ChatGPT Plugins — integrate Evalgate tools into ChatGPT workflows
Custom MCP clients — any client that speaks the MCP tool discovery and execution protocol

API endpoints

Method	Endpoint	Auth	Description
`GET`	`/api/mcp/tools`	None	List available tools and input schemas
`POST`	`/api/mcp/call`	Required	Execute a tool

Authentication

Authenticated requests require an API key passed as a bearer token:

Authorization: Bearer <EVALGATE_API_KEY>

Get your API key from Settings → Developer in the dashboard. When you are using Evalgate directly in a browser, the session cookie authenticates automatically.

The GET /api/mcp/tools endpoint is public and requires no authentication. Only tool execution via POST /api/mcp/call requires an API key.

Discover available tools

Call the tool discovery endpoint to retrieve all available tools and their input schemas.

curl -X GET "https://evalgate.com/api/mcp/tools"

Response:

{
  "tools": [
    {
      "name": "eval.quality.latest",
      "description": "Get the latest quality score for an evaluation.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "evaluationId": { "type": "number", "description": "ID of the evaluation" },
          "baseline": { "type": "string", "enum": ["published", "previous", "production"] }
        },
        "required": ["evaluationId"]
      }
    }
  ]
}

Execute a tool

Pass the tool name and arguments to /api/mcp/call. Always include your API key.

curl -X POST "https://evalgate.com/api/mcp/call" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "tool": "eval.quality.latest",
    "arguments": { "evaluationId": 42, "baseline": "published" }
  }'

Success (200):

{
  "ok": true,
  "content": [{ "type": "text", "text": "{\"score\":85,\"baselineScore\":82,...}" }]
}

Error (400 / 4xx / 5xx):

{
  "ok": false,
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Evaluation not found",
    "requestId": "uuid"
  }
}

Server-side tools

These tools are available via /api/mcp/call from any external MCP client.

eval.quality.latest — get latest quality score

Retrieves the latest quality score for an evaluation, with optional baseline comparison.

Parameter	Type	Required	Description
`evaluationId`	number	Yes	ID of the evaluation
`baseline`	string	No	One of `published`, `previous`, or `production`

{
  "tool": "eval.quality.latest",
  "arguments": { "evaluationId": 42, "baseline": "published" }
}

eval.run.create — create a new evaluation run

Creates a new evaluation run for a given evaluation, optionally targeting a specific environment.

Parameter	Type	Required	Description
`evaluationId`	number	Yes	ID of the evaluation to run
`environment`	string	No	Target environment (e.g. `production`, `staging`)

{
  "tool": "eval.run.create",
  "arguments": { "evaluationId": 42, "environment": "staging" }
}

eval.trace.create — create a distributed trace

Creates a new trace in Evalgate for recording model or agent behavior.

Parameter	Type	Required	Description
`name`	string	Yes	Name for the trace
`metadata`	object	No	Arbitrary key-value metadata to attach

{
  "tool": "eval.trace.create",
  "arguments": { "name": "user-query-trace", "metadata": { "env": "prod" } }
}

eval.testcase.list — list test cases for an evaluation

Lists test cases attached to a given evaluation.

Parameter	Type	Required	Description
`evaluationId`	number	Yes	ID of the evaluation
`limit`	number	No	Maximum number of test cases to return

{
  "tool": "eval.testcase.list",
  "arguments": { "evaluationId": 42, "limit": 20 }
}

MCP tool execution uses a dedicated rate limit tier: 100 requests per minute per IP or API key. This is separate from the standard REST API rate limit.

WebMCP tools (browser)

When the Evalgate dashboard is open in your browser, six additional tools are registered via navigator.modelContext. These are available to AI assistants running in the browser context — Cursor, Claude, and ChatGPT can call them when the dashboard tab is active.

list_evaluation_templates — browse template library

Lists evaluation templates across 17 categories including unit_tests, adversarial, human_eval, llm_judge, chatbot, rag, code-gen, and more.

Parameter	Type	Required	Description
`category`	string	No	Filter by category (e.g. `"rag"`, `"adversarial"`)
`limit`	number	No	Maximum number of templates to return

await tool("list_evaluation_templates", {
  category: "rag",
  limit: 5
});

get_evaluation_test_cases — retrieve test cases by evaluation ID

Retrieves test cases for a specific evaluation.

Parameter	Type	Required	Description
`evaluationId`	number	Yes	ID of the evaluation
`limit`	number	No	Maximum number of test cases to return

await tool("get_evaluation_test_cases", {
  evaluationId: 42,
  limit: 10
});

create_evaluation — create a new evaluation

Creates a new evaluation. Supported types: unit_test, human_eval, model_eval, ab_test. Optionally include test cases inline.

Parameter	Type	Required	Description
`name`	string	Yes	Display name for the evaluation
`type`	string	Yes	One of `unit_test`, `human_eval`, `model_eval`, `ab_test`
`description`	string	No	Optional description
`testCases`	array	No	Inline test cases to attach on creation

await tool("create_evaluation", {
  name: "GPT-4o Accuracy",
  type: "unit_test",
  description: "Test factual accuracy"
});

run_evaluation — execute an evaluation run

Fetches the test cases for an evaluation, runs them, and computes pass/fail scores.

Parameter	Type	Required	Description
`evaluationId`	number	Yes	ID of the evaluation to run

await tool("run_evaluation", {
  evaluationId: 42
});

get_evaluation_results — retrieve run results

Returns pass/fail counts, per-test-case results, scores, and run status for an evaluation.

Parameter	Type	Required	Description
`evaluationId`	number	Yes	ID of the evaluation
`limit`	number	No	Number of results to return (default: 10)

await tool("get_evaluation_results", {
  evaluationId: 42,
  limit: 5
});

get_quality_score — get the latest quality score

Returns the quality score from the most recent evaluation run: name, status, total/passed/failed counts, and pass rate.

Parameter	Type	Required	Description
`evaluationId`	number	Yes	ID of the evaluation

await tool("get_quality_score", {
  evaluationId: 42
});

Set up a client

Configure your MCP client to point at the Evalgate tool discovery endpoint and authenticate with your API key.

Cursor IDE
Claude Desktop

Add the following to your Cursor MCP server configuration:

{
  "mcpServers": {
    "evalgate": {
      "command": "curl",
      "args": ["https://evalgate.com/api/mcp/tools"]
    }
  }
}

Add the following to your Claude Desktop configuration file:

{
  "mcpServers": {
    "evalgate": {
      "url": "https://evalgate.com/api/mcp/tools",
      "auth": "Bearer YOUR_API_KEY"
    }
  }
}

Replace YOUR_API_KEY with a key from Settings → Developer.

Tool executions are logged to your API usage history. View per-key usage in Settings → Developer → API Keys → Usage.

Get Started

Core Concepts

Guides

SDK Reference

Platform

Mcp integration

Use Evalgate tools via MCP in AI agents

Supported clients

API endpoints

Authentication

Discover available tools

Execute a tool

Server-side tools

WebMCP tools (browser)

Set up a client

Get Started

Core Concepts

Guides

SDK Reference

Platform

Documentation Index

​Use Evalgate tools via MCP in AI agents

​Supported clients

​API endpoints

​Authentication

​Discover available tools

​Execute a tool

​Server-side tools

​WebMCP tools (browser)

​Set up a client

Use Evalgate tools via MCP in AI agents

Supported clients

API endpoints

Authentication

Discover available tools

Execute a tool

Server-side tools

WebMCP tools (browser)

Set up a client