Skip to main content

Introduction

While traditional software tests have clear pass/fail conditions, AI outputs are non-deterministic — they can vary with the same input. Scorers help bridge this gap by providing quantifiable metrics for measuring agent quality. Scorers are automated tests that evaluate Agent outputs using model-graded, rule-based, and statistical methods. Scorers return scores: numerical values (typically between 0 and 1) that quantify how well an output meets your evaluation criteria. These scores enable you to objectively track performance, compare different approaches, and identify areas for improvement in your AI systems. Scorers can be customized with your own prompts and scoring functions. Scorers can be run in the cloud, capturing real-time results. But scorers can also be part of your CI/CD pipeline, allowing you to test and monitor your agents over time.

Types of Scorers

There are different kinds of scorers, each serving a specific purpose. Here are some common types:
  • Textual Scorers: Evaluate accuracy, reliability, and context understanding of agent responses
  • Classification Scorers: Measure accuracy in categorizing data based on predefined categories
  • Prompt Engineering Scorers: Explore impact of different instructions and input formats

Installation

To access Mastra’s scorers feature, install the @mastra/evals package:
npm install @mastra/evals@latest

Live Evaluations

Live evaluations allow you to automatically score AI outputs in real-time as your agents and workflows operate. Instead of running evaluations manually or in batches, scorers run asynchronously alongside your AI systems, providing continuous quality monitoring.

Adding Scorers to Agents

You can add built-in scorers to your agents to automatically evaluate their outputs. See the full list of built-in scorers for all available options.
src/mastra/agents/evaluated-agent.ts
import { Agent } from '@mastra/core/agent'
import { createAnswerRelevancyScorer, createToxicityScorer } from '@mastra/evals/scorers/prebuilt'

export const evaluatedAgent = new Agent({
  scorers: {
    relevancy: {
      scorer: createAnswerRelevancyScorer({ model: 'openai/gpt-4.1-nano' }),
      sampling: { type: 'ratio', rate: 0.5 },
    },
    safety: {
      scorer: createToxicityScorer({ model: 'openai/gpt-4.1-nano' }),
      sampling: { type: 'ratio', rate: 1 },
    },
  },
})

Adding Scorers to Workflow Steps

You can also add scorers to individual workflow steps to evaluate outputs at specific points in your process:
src/mastra/workflows/content-generation.ts
import { createWorkflow, createStep } from "@mastra/core/workflows";
import { z } from "zod";
import { customStepScorer } from "../scorers/custom-step-scorer";

const contentStep = createStep({
  scorers: {
    customStepScorer: {
      scorer: customStepScorer(),
      sampling: {
        type: "ratio",
        rate: 1, // Score every step execution
      }
    }
  },
});

export const contentWorkflow = createWorkflow({ ... })
  .then(contentStep)
  .commit();

How Live Evaluations Work

When you add scorers to agents or workflow steps, they automatically run in the background as your AI systems execute. The scoring happens asynchronously, so it doesn’t block or slow down your agent responses. Results are captured and stored for analysis, allowing you to monitor quality trends over time.

Next Steps