What is a Reliability Assessment?

A Reliability Assessment measures how consistently and truthfully your model responds across a dataset. Collinear AI runs each sample through a selected reliability judge, which detects hallucinations or factual inconsistencies. This helps you:
  • Quantify your model’s factual accuracy
  • Identify hallucination-prone outputs
  • Compare performance across different models or prompts

Interactive Walkthrough

Want to see it in action? Follow this guided demo to create your reliability run:

Introduction

Once you connect your knowledge base or upload your dataset with context, you can run a reliability evaluation on it using Collinear AI’s suite of reliability judges.

Getting Started

After connecting your knowledge base or uploading your dataset, you can initiate a reliability assessment using one of Collinear AI’s reliability judges.

Select a Judge

Choose from the following reliability models:
  1. Veritas – Collinear’s advanced large model for in-depth hallucination detection.
  2. Veritas Nano – Collinear’s ultra-fast binary classifier for hallucination detection.
  3. Lynx 8B – Patronus AI’s off-the-shelf model for hallucination detection.
  4. LLM-as-a-Judge – Use any custom model with a tailored prompt for flexible evaluation.

Select a Context Engine

Choose how you’d like to include contextual grounding during evaluation:

Options

  1. Use Context From Dataset Pulls relevant context directly from your uploaded dataset.
  2. Add Context Engine Use a RAG (Retrieve-and-Generate) engine connected directly to your knowledge base to provide additional context.

Required Fields for RAG Integration:

  • Content Engine API Key – Authenticate securely with your context engine.
  • RAG Host – URL for the server powering the RAG service.
  • Index – Optimized data structure for efficient search and retrieval.
  • Namespace – Logical grouping to avoid identifier conflicts.
  • Top K – Controls how many of the top results to fetch from the index.