Creating a Reliability Evaluation
Use the Collinear AI Platform to create a new reliability evaluation.
✅ What is a Reliability Evaluation?
A Reliability Evaluation measures how consistently and truthfully your model responds across a dataset. Collinear AI runs each sample through a selected reliability judge, which detects hallucinations or factual inconsistencies.
This helps you:
- Quantify your model’s factual accuracy
- Identify hallucination-prone outputs
- Compare performance across different models or prompts
🎥 Interactive Walkthrough
Want to see it in action? Follow this guided demo to create your reliability run:
Introduction
Once you connect your model or upload your dataset, you can run a reliability evaluation on it using Collinear AI’s suite of reliability judges.
🚀 Getting Started
After connecting your model or uploading your dataset, you can initiate a reliability evaluation using one of Collinear AI’s reliability judges.
🧑⚖️ Select a Judge
Choose from the following reliability models:
- Lynx 8B – Patronus AI’s off-the-shelf model for hallucination detection.
- Veritas Nano – Collinear’s ultra-fast binary classifier for hallucination detection.
- Veritas – Collinear’s advanced large model for in-depth hallucination detection.
- Prompted Model – Use any custom model with a tailored prompt for flexible evaluation.
🧠 Select a Context Engine
Choose how you’d like to include contextual grounding during evaluation:
Options
-
Use Context From Dataset Pulls relevant context directly from your uploaded dataset.
-
Add Context Engine Use a RAG (Retrieve-and-Generate) engine to provide additional context.
Required Fields for RAG Integration:
- Content Engine API Key – Authenticate securely with your context engine.
- RAG Host – URL for the server powering the RAG service.
- Index – Optimized data structure for efficient search and retrieval.
- Namespace – Logical grouping to avoid identifier conflicts.
- Top K – Controls how many of the top results to fetch from the index.
Was this page helpful?