Agent governance

Provider and model policy

Control which AI providers and models your organization can use for judge evaluations. These settings apply to all evaluation runs across your org.What you can configure:

Disable external providers — block all outbound calls to a provider entirely (e.g. disallow OpenAI for all judge evaluations)
Allowlist specific providers and models — restrict usage to an approved list; calls to any model outside the allowlist are rejected before they leave the platform
Enforce default judge presets — lock in a standard judge configuration so evaluations always use the same scoring approach across teams
Cost caps — set a maximum spend threshold per evaluation run to prevent runaway judge costs

Provider and model restrictions are enforced before outbound calls are made. Disallowed providers are blocked at the platform boundary — the request never reaches the external API.

PII protection

EvalGate scrubs inputs before they reach the embedding pipeline or external judge calls, so sensitive data in your test cases or traces does not leave your org boundary in cleartext.How it works:

Inputs pass through the scrubber before embedding or being sent to an external judge
External judge calls respect your org’s PII policy — if scrubbing is not permitted for a given request, the provider call is blocked entirely
Scrubbing is auditable: the audit log records that scrubbing occurred and what policy applied, without storing the raw sensitive values themselves

To enable PII protection:Go to Settings → Organization → PII Policy and enable scrubbing for the categories relevant to your use case (e.g. email addresses, phone numbers, names). Changes apply immediately to all subsequent evaluation runs.

Enabling PII protection does not retroactively scrub data already stored in traces or evaluation records. Apply PII policies before ingesting sensitive data.

Drift enforcement

The drift gate enforces cross-surface parity — it detects when evaluation behavior shifts across environments, model versions, or prompt changes and treats those shifts as violations, not informational alerts.What drift enforcement covers:

Cross-surface parity — the drift gate compares results across surfaces (e.g. staging vs. production, model v1 vs. v2) and flags deviations that exceed your thresholds
Runtime monitors — continuous tracking of score distribution and variance shifts catches gradual degradation that single-run comparisons miss
Hardcoded threshold violations — if you set a minimum pass rate or quality score floor, breaches are treated as violations and surface in the dashboard and CI gates
Versioned, auditable baselines — every baseline is versioned so you can trace exactly which run and configuration a comparison was made against

Use drift enforcement in combination with CI gates to automatically block deployments when a model or prompt change causes a statistically significant quality regression.

Audit logging and org isolation

Every action that modifies data in EvalGate is org-scoped and recorded in the audit log. This applies to all users and API keys in your organization.What is audited:

All mutations (evaluation creation, run creation, test case changes, configuration updates) are attributed to an org member or API key with a timestamp
Judge executions record the full execution context: model config, retry count, timing, and whether PII redaction was applied
Controls are queryable through the API so you can export audit data for compliance reporting

Access control:

RBAC is part of the control layer — role assignments determine which members can create evaluations, run judges, export data, and modify org settings
Multi-tenant isolation ensures data from one org is never accessible to another, even under the same EvalGate instance

Manage member roles from Settings → Organization → Members.

Retention and destructive cleanup

Retention policy in EvalGate is not just metadata tagging. You can configure org-scoped destructive retention sweeps that permanently remove expired evaluation data and logs under policy.How retention works:

Set a retention period for evaluation runs, traces, and audit records
When a sweep runs, expired data is actually deleted — not soft-deleted or flagged
Each sweep is recorded in the audit log with scope, timestamp, and affected record counts so you have full visibility into what was removed and when

Configure retention schedules from Settings → Organization → Data Retention.

Destructive retention sweeps are irreversible. Review your retention settings carefully and confirm the policy applies only to the data categories you intend before activating a sweep.

Provider policy

Settings → Organization → Providers

PII protection

Settings → Organization → PII Policy

Drift thresholds

Settings → Organization → Drift & Baselines

Member roles

Settings → Organization → Members

Get Started

Core Concepts

Guides

SDK Reference

Platform

Enterprise governance and compliance controls

Where to configure governance

Provider policy

PII protection

Drift thresholds

Member roles

​Enterprise governance and compliance controls

​Where to configure governance

Provider policy

PII protection

Drift thresholds

Member roles

Enterprise governance and compliance controls

Where to configure governance