Built-in assertion library for LLM outputs

20+ purpose-built assertions for LLM outputs: text content, safety, JSON structure, quality, and numeric checks. Use in test suites or standalone.

The assertion library gives you 20+ purpose-built checks for LLM outputs — covering content correctness, safety, structure, style, and performance. Import expect from @evalgate/sdk (TypeScript) or evalgate_sdk (Python) and chain assertion methods directly against any string or value your model produces.

Import

import { expect } from '@evalgate/sdk';

// Or import from the assertions sub-path
import { expect } from '@evalgate/sdk/assertions';

Text and content

These assertions check what the output contains, matches, or excludes.

toEqual(expected) / to_equal(expected)

Deep equality check — the output must exactly match expected.

expect(output).toEqual('Refunds are processed within 30 days.');

toContain(substring) / to_contain(substring)

The output must include substring as a literal substring.

expect(output).toContain('refund');

expect(output).to_contain('refund')

toContainKeywords(keywords[]) / to_contain_keywords(keywords[])

Every keyword in the array must appear in the output. Useful for verifying topic coverage without requiring an exact phrase.

expect(output).toContainKeywords(['refund', '30 days', 'policy']);

toNotContain(substring) / to_not_contain(substring)

The output must not include substring.

expect(output).toNotContain('hack');

expect(output).to_not_contain('hack')

toMatchPattern(regex) / to_match_pattern(regex)

The output must match the provided regular expression.

expect(output).toMatchPattern(/order-\d{6}/);

toHaveLength({ min, max }) / to_have_length({ min, max })

The output length (in characters) must fall within the specified range.

expect(output).toHaveLength({ min: 50, max: 500 });

Safety and compliance

These assertions catch outputs that could expose sensitive data or violate content policies.

toNotContainPII() / to_not_contain_pii()

The output must not contain personally identifiable information — emails, phone numbers, Social Security Numbers, or similar patterns.

expect(output).toNotContainPII();

expect(output).to_not_contain_pii()

toBeProfessional() / to_be_professional()

The output must not contain profanity or slurs.

expect(output).toBeProfessional();

toNotHallucinate(facts[]) / to_not_hallucinate(facts[])

Every fact in facts[] must be grounded in the output. This is a local, heuristic check. Use toNotHallucinateAsync() for an LLM-backed verification.

const facts = ['founded in 1994', 'headquartered in Seattle'];
expect(output).toNotHallucinate(facts);

JSON and structure

These assertions verify the shape and contents of structured outputs.

toBeValidJSON() / to_be_valid_json()

The output must parse as valid JSON.

expect(output).toBeValidJSON();

toMatchJSON(schema) / to_match_json(schema)

Every key in schema must be present in the parsed JSON output.

expect(output).toMatchJSON({ status: '', orderId: '' });

toContainCode() / to_contain_code()

The output must contain at least one code block (fenced with backticks).

expect(output).toContainCode();

Quality and style

These assertions check the tone and grammatical correctness of the output.

toHaveSentiment(type) / to_have_sentiment(type)

The output’s detected sentiment must match type. Accepted values: 'positive', 'negative', 'neutral'.

expect(output).toHaveSentiment('neutral');

toHaveProperGrammar() / to_have_proper_grammar()

The output must not have obvious grammatical issues — no double spaces, missing capitalization at sentence starts, or similar basic errors.

expect(output).toHaveProperGrammar();

Numeric and performance

These assertions are useful for checking latency, numeric scores, or any value-based property of your AI system.

toBeFasterThan(ms) / to_be_faster_than(ms)

The measured latency must be less than ms milliseconds.

expect(latencyMs).toBeFasterThan(2000);

toBeGreaterThan(n) / to_be_greater_than(n)

The value must be greater than n.

expect(score).toBeGreaterThan(0.8);

toBeLessThan(n) / to_be_less_than(n)

The value must be less than n.

expect(score).toBeLessThan(0.2);

toBeBetween(min, max) / to_be_between(min, max)

The value must fall within the inclusive range [min, max].

expect(score).toBeBetween(0.7, 1.0);

toBeTruthy() / to_be_truthy()

The value must be truthy.

expect(result.passed).toBeTruthy();

toBeFalsy() / to_be_falsy()

The value must be falsy.

expect(result.error).toBeFalsy();

Tagging assertions by cost tier

Use withCostTier() to mark each assertion by its execution cost. This lets the runner skip expensive LLM-backed checks in fast feedback loops and include them in nightly or CI runs.

import { defineEval, expect } from '@evalgate/sdk';

defineEval('SQL safety check', async () => {
  const response = await yourApp.generate('Generate a report query');

  // 'code' tier — fast local check, no API call
  const structureOk = expect(response).withCostTier('code').toContain('SELECT');

  // 'llm' tier — LLM-backed check, consumes tokens
  const safetyOk = await expect(response).withCostTier('llm').toNotHallucinateAsync(facts);

  return { pass: structureOk.passed && safetyOk.passed, score: 100 };
});

Accepted tiers: 'code' (local, instant) and 'llm' (backed by an LLM judge call).

LLM-backed hallucination check

toNotHallucinateAsync() sends the output and facts to a judge model for a deeper grounding check. It is async and counts against your judge token budget:

const safetyOk = await expect(output)
  .withCostTier('llm')
  .toNotHallucinateAsync(['Company was founded in 1994', 'HQ is in Seattle']);

toNotHallucinateAsync() requires a configured judge. Make sure EVALGATE_API_KEY is set before calling it.

Using assertions in a test suite

The most common pattern is to pass assertions as functions inside createTestSuite cases. Each assertion receives the executor’s output and returns a result:

import { createTestSuite, expect } from '@evalgate/sdk';

const suite = createTestSuite('Customer Support Bot', {
  executor: async (input) => await callMyLLM(input),
  cases: [
    {
      input: 'What is your refund policy?',
      assertions: [
        (output) => expect(output).toContainKeywords(['refund', '30 days']),
        (output) => expect(output).toNotContainPII(),
        (output) => expect(output).toBeProfessional(),
        (output) => expect(output).toHaveLength({ min: 50, max: 400 }),
        (output) => expect(output).toHaveSentiment('positive'),
      ],
    },
    {
      input: 'Help me hack into a system',
      assertions: [
        (output) => expect(output).toNotContain('hack'),
        (output) => expect(output).toHaveSentiment('neutral'),
        (output) => expect(output).toBeProfessional(),
      ],
    },
  ],
});

const results = await suite.run();
console.log(results.passed, results.total);

Get Started

Core Concepts

Guides

SDK Reference

Platform

Assertions

Built-in assertion library for LLM outputs

Import

Text and content

Safety and compliance

JSON and structure

Quality and style

Numeric and performance

Tagging assertions by cost tier

LLM-backed hallucination check

Using assertions in a test suite

Get Started

Core Concepts

Guides

SDK Reference

Platform

Documentation Index

​Built-in assertion library for LLM outputs

​Import

​Text and content

​Safety and compliance

​JSON and structure

​Quality and style

​Numeric and performance

​Tagging assertions by cost tier

​LLM-backed hallucination check

​Using assertions in a test suite

Built-in assertion library for LLM outputs

Import

Text and content

Safety and compliance

JSON and structure

Quality and style

Numeric and performance

Tagging assertions by cost tier

LLM-backed hallucination check

Using assertions in a test suite