Model Roles and Responsibilities

This guide provides a detailed overview of our different AI judges, categorized by their roles in safety, reliability, and performance runs. Each judge is designed to ensure that your AI systems operate safely, reliably, and optimally in diverse environments.

Key Features:

  • Preventative Control: Detects and prevents potential issues pre-production
  • Detective Control: Identifies and flags errors post-occurrence
  • Monitoring: Offers general oversight and diagnostics of system performance

Model Judges Options

Judge TypeModel NameDescription
SafetyCollinear GuardProprietary model purpose-designed to rate responses against a custom likert scale
SafetyCollinear Guard NanoLightweight proprietary model built for rapid evaluation cycles, providing pass / fail ratings
SafetyLlama Guard 3A specialized open-source model built for safety evaluations by Meta
SafetyLLM-as-a-JudgeAny leading off-the-shelf model for toxicity and bias evaluation
ReliabilityVeritasProprietary model purpose-designed for advanced hallucination detection
ReliabilityVeritas NanoProprietary lightweight model built for low-latency hallucination detection
ReliabilityLynx 8BSpecialized open-source model built for detecting hallucinations by Patronus AI
ReliabilityLLM-as-a-JudgeAny leading off-the-shelf model for accuracy evaluation
PerformanceCollinear FlexProprietary model purpose-designed to rate responses against a custom likert scale
PerformanceQuality of ReasoningSpecialized SLM focused on the depth, logic, and reasoning used to support conclusions
PerformanceInstruction FollowingSpecialized SLM focused on assessing how aligned the output adheres to given instruction
PerformanceCoherenceSpecialized SLM focused on checking if the output maintains logical and semantic flow
PerformanceLLM-as-a-JudgeAny leading off-the-shelf model for custom evaluations