Skip to main content

Documentation Index

Fetch the complete documentation index at: https://evalgate.com/docs/llms.txt

Use this file to discover all available pages before exploring further.

Built-in assertion library for LLM outputs

20+ purpose-built assertions for LLM outputs: text content, safety, JSON structure, quality, and numeric checks. Use in test suites or standalone.
The assertion library gives you 20+ purpose-built checks for LLM outputs — covering content correctness, safety, structure, style, and performance. Import expect from @evalgate/sdk (TypeScript) or evalgate_sdk (Python) and chain assertion methods directly against any string or value your model produces.

Import

import { expect } from '@evalgate/sdk';

// Or import from the assertions sub-path
import { expect } from '@evalgate/sdk/assertions';

Text and content

These assertions check what the output contains, matches, or excludes.
Deep equality check — the output must exactly match expected.
expect(output).toEqual('Refunds are processed within 30 days.');
The output must include substring as a literal substring.
expect(output).toContain('refund');
expect(output).to_contain('refund')
Every keyword in the array must appear in the output. Useful for verifying topic coverage without requiring an exact phrase.
expect(output).toContainKeywords(['refund', '30 days', 'policy']);
The output must not include substring.
expect(output).toNotContain('hack');
expect(output).to_not_contain('hack')
The output must match the provided regular expression.
expect(output).toMatchPattern(/order-\d{6}/);
The output length (in characters) must fall within the specified range.
expect(output).toHaveLength({ min: 50, max: 500 });

Safety and compliance

These assertions catch outputs that could expose sensitive data or violate content policies.
The output must not contain personally identifiable information — emails, phone numbers, Social Security Numbers, or similar patterns.
expect(output).toNotContainPII();
expect(output).to_not_contain_pii()
The output must not contain profanity or slurs.
expect(output).toBeProfessional();
Every fact in facts[] must be grounded in the output. This is a local, heuristic check. Use toNotHallucinateAsync() for an LLM-backed verification.
const facts = ['founded in 1994', 'headquartered in Seattle'];
expect(output).toNotHallucinate(facts);

JSON and structure

These assertions verify the shape and contents of structured outputs.
The output must parse as valid JSON.
expect(output).toBeValidJSON();
Every key in schema must be present in the parsed JSON output.
expect(output).toMatchJSON({ status: '', orderId: '' });
The output must contain at least one code block (fenced with backticks).
expect(output).toContainCode();

Quality and style

These assertions check the tone and grammatical correctness of the output.
The output’s detected sentiment must match type. Accepted values: 'positive', 'negative', 'neutral'.
expect(output).toHaveSentiment('neutral');
The output must not have obvious grammatical issues — no double spaces, missing capitalization at sentence starts, or similar basic errors.
expect(output).toHaveProperGrammar();

Numeric and performance

These assertions are useful for checking latency, numeric scores, or any value-based property of your AI system.
The measured latency must be less than ms milliseconds.
expect(latencyMs).toBeFasterThan(2000);
The value must be greater than n.
expect(score).toBeGreaterThan(0.8);
The value must be less than n.
expect(score).toBeLessThan(0.2);
The value must fall within the inclusive range [min, max].
expect(score).toBeBetween(0.7, 1.0);
The value must be truthy.
expect(result.passed).toBeTruthy();
The value must be falsy.
expect(result.error).toBeFalsy();

Tagging assertions by cost tier

Use withCostTier() to mark each assertion by its execution cost. This lets the runner skip expensive LLM-backed checks in fast feedback loops and include them in nightly or CI runs.
import { defineEval, expect } from '@evalgate/sdk';

defineEval('SQL safety check', async () => {
  const response = await yourApp.generate('Generate a report query');

  // 'code' tier — fast local check, no API call
  const structureOk = expect(response).withCostTier('code').toContain('SELECT');

  // 'llm' tier — LLM-backed check, consumes tokens
  const safetyOk = await expect(response).withCostTier('llm').toNotHallucinateAsync(facts);

  return { pass: structureOk.passed && safetyOk.passed, score: 100 };
});
Accepted tiers: 'code' (local, instant) and 'llm' (backed by an LLM judge call).

LLM-backed hallucination check

toNotHallucinateAsync() sends the output and facts to a judge model for a deeper grounding check. It is async and counts against your judge token budget:
const safetyOk = await expect(output)
  .withCostTier('llm')
  .toNotHallucinateAsync(['Company was founded in 1994', 'HQ is in Seattle']);
toNotHallucinateAsync() requires a configured judge. Make sure EVALGATE_API_KEY is set before calling it.

Using assertions in a test suite

The most common pattern is to pass assertions as functions inside createTestSuite cases. Each assertion receives the executor’s output and returns a result:
import { createTestSuite, expect } from '@evalgate/sdk';

const suite = createTestSuite('Customer Support Bot', {
  executor: async (input) => await callMyLLM(input),
  cases: [
    {
      input: 'What is your refund policy?',
      assertions: [
        (output) => expect(output).toContainKeywords(['refund', '30 days']),
        (output) => expect(output).toNotContainPII(),
        (output) => expect(output).toBeProfessional(),
        (output) => expect(output).toHaveLength({ min: 50, max: 400 }),
        (output) => expect(output).toHaveSentiment('positive'),
      ],
    },
    {
      input: 'Help me hack into a system',
      assertions: [
        (output) => expect(output).toNotContain('hack'),
        (output) => expect(output).toHaveSentiment('neutral'),
        (output) => expect(output).toBeProfessional(),
      ],
    },
  ],
});

const results = await suite.run();
console.log(results.passed, results.total);