Skip to main content

Documentation Index

Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Use Evalgate tools via MCP in AI agents

Connect Cursor, Claude Desktop, or any MCP-compatible agent to Evalgate for live tool access to evaluations, quality scores, and traces.
Evalgate exposes an MCP-style tool discovery and execution API so AI agents can call platform services directly — without leaving the IDE or agent context. Any MCP-compatible client can discover available tools and execute them against your Evalgate workspace using your API key.

Supported clients

  • Cursor IDE — use evaluations and quality scores without leaving your editor
  • Claude Desktop — run evaluations and retrieve results from the chat interface
  • ChatGPT Plugins — integrate Evalgate tools into ChatGPT workflows
  • Custom MCP clients — any client that speaks the MCP tool discovery and execution protocol

API endpoints

MethodEndpointAuthDescription
GET/api/mcp/toolsNoneList available tools and input schemas
POST/api/mcp/callRequiredExecute a tool

Authentication

Authenticated requests require an API key passed as a bearer token:
Authorization: Bearer <EVALGATE_API_KEY>
Get your API key from Settings → Developer in the dashboard. When you are using Evalgate directly in a browser, the session cookie authenticates automatically.
The GET /api/mcp/tools endpoint is public and requires no authentication. Only tool execution via POST /api/mcp/call requires an API key.

Discover available tools

Call the tool discovery endpoint to retrieve all available tools and their input schemas.
curl -X GET "https://evalgate.com/api/mcp/tools"
Response:
{
  "tools": [
    {
      "name": "eval.quality.latest",
      "description": "Get the latest quality score for an evaluation.",
      "inputSchema": {
        "type": "object",
        "properties": {
          "evaluationId": { "type": "number", "description": "ID of the evaluation" },
          "baseline": { "type": "string", "enum": ["published", "previous", "production"] }
        },
        "required": ["evaluationId"]
      }
    }
  ]
}

Execute a tool

Pass the tool name and arguments to /api/mcp/call. Always include your API key.
curl -X POST "https://evalgate.com/api/mcp/call" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "tool": "eval.quality.latest",
    "arguments": { "evaluationId": 42, "baseline": "published" }
  }'
Success (200):
{
  "ok": true,
  "content": [{ "type": "text", "text": "{\"score\":85,\"baselineScore\":82,...}" }]
}
Error (400 / 4xx / 5xx):
{
  "ok": false,
  "error": {
    "code": "VALIDATION_ERROR",
    "message": "Evaluation not found",
    "requestId": "uuid"
  }
}

Server-side tools

These tools are available via /api/mcp/call from any external MCP client.
Retrieves the latest quality score for an evaluation, with optional baseline comparison.
ParameterTypeRequiredDescription
evaluationIdnumberYesID of the evaluation
baselinestringNoOne of published, previous, or production
{
  "tool": "eval.quality.latest",
  "arguments": { "evaluationId": 42, "baseline": "published" }
}
Creates a new evaluation run for a given evaluation, optionally targeting a specific environment.
ParameterTypeRequiredDescription
evaluationIdnumberYesID of the evaluation to run
environmentstringNoTarget environment (e.g. production, staging)
{
  "tool": "eval.run.create",
  "arguments": { "evaluationId": 42, "environment": "staging" }
}
Creates a new trace in Evalgate for recording model or agent behavior.
ParameterTypeRequiredDescription
namestringYesName for the trace
metadataobjectNoArbitrary key-value metadata to attach
{
  "tool": "eval.trace.create",
  "arguments": { "name": "user-query-trace", "metadata": { "env": "prod" } }
}
Lists test cases attached to a given evaluation.
ParameterTypeRequiredDescription
evaluationIdnumberYesID of the evaluation
limitnumberNoMaximum number of test cases to return
{
  "tool": "eval.testcase.list",
  "arguments": { "evaluationId": 42, "limit": 20 }
}
MCP tool execution uses a dedicated rate limit tier: 100 requests per minute per IP or API key. This is separate from the standard REST API rate limit.

WebMCP tools (browser)

When the Evalgate dashboard is open in your browser, six additional tools are registered via navigator.modelContext. These are available to AI assistants running in the browser context — Cursor, Claude, and ChatGPT can call them when the dashboard tab is active.
Lists evaluation templates across 17 categories including unit_tests, adversarial, human_eval, llm_judge, chatbot, rag, code-gen, and more.
ParameterTypeRequiredDescription
categorystringNoFilter by category (e.g. "rag", "adversarial")
limitnumberNoMaximum number of templates to return
await tool("list_evaluation_templates", {
  category: "rag",
  limit: 5
});
Retrieves test cases for a specific evaluation.
ParameterTypeRequiredDescription
evaluationIdnumberYesID of the evaluation
limitnumberNoMaximum number of test cases to return
await tool("get_evaluation_test_cases", {
  evaluationId: 42,
  limit: 10
});
Creates a new evaluation. Supported types: unit_test, human_eval, model_eval, ab_test. Optionally include test cases inline.
ParameterTypeRequiredDescription
namestringYesDisplay name for the evaluation
typestringYesOne of unit_test, human_eval, model_eval, ab_test
descriptionstringNoOptional description
testCasesarrayNoInline test cases to attach on creation
await tool("create_evaluation", {
  name: "GPT-4o Accuracy",
  type: "unit_test",
  description: "Test factual accuracy"
});
Fetches the test cases for an evaluation, runs them, and computes pass/fail scores.
ParameterTypeRequiredDescription
evaluationIdnumberYesID of the evaluation to run
await tool("run_evaluation", {
  evaluationId: 42
});
Returns pass/fail counts, per-test-case results, scores, and run status for an evaluation.
ParameterTypeRequiredDescription
evaluationIdnumberYesID of the evaluation
limitnumberNoNumber of results to return (default: 10)
await tool("get_evaluation_results", {
  evaluationId: 42,
  limit: 5
});
Returns the quality score from the most recent evaluation run: name, status, total/passed/failed counts, and pass rate.
ParameterTypeRequiredDescription
evaluationIdnumberYesID of the evaluation
await tool("get_quality_score", {
  evaluationId: 42
});

Set up a client

Configure your MCP client to point at the Evalgate tool discovery endpoint and authenticate with your API key.
Add the following to your Cursor MCP server configuration:
{
  "mcpServers": {
    "evalgate": {
      "command": "curl",
      "args": ["https://evalgate.com/api/mcp/tools"]
    }
  }
}
Tool executions are logged to your API usage history. View per-key usage in Settings → Developer → API Keys → Usage.