What is a Reliability Assessment?
A Reliability Assessment measures how consistently and truthfully your model responds across a dataset. Collinear AI runs each sample through a selected reliability judge, which detects hallucinations or factual inconsistencies. This helps you:- Quantify your model’s factual accuracy
- Identify hallucination-prone outputs
- Compare performance across different models or prompts
Interactive Walkthrough
Want to see it in action? Follow this guided demo to create your reliability run:Introduction
Once you connect your knowledge base or upload your dataset with context, you can run a reliability evaluation on it using Collinear AI’s suite of reliability judges.Getting Started
After connecting your knowledge base or uploading your dataset, you can initiate a reliability assessment using one of Collinear AI’s reliability judges.Select a Judge
Choose from the following reliability models:- Veritas – Collinear’s advanced large model for in-depth hallucination detection.
- Veritas Nano – Collinear’s ultra-fast binary classifier for hallucination detection.
- Lynx 8B – Patronus AI’s off-the-shelf model for hallucination detection.
- LLM-as-a-Judge – Use any custom model with a tailored prompt for flexible evaluation.
Select a Context Engine
Choose how you’d like to include contextual grounding during evaluation:Options
- Use Context From Dataset Pulls relevant context directly from your uploaded dataset.
- Add Context Engine Use a RAG (Retrieve-and-Generate) engine connected directly to your knowledge base to provide additional context.
Required Fields for RAG Integration:
- Content Engine API Key – Authenticate securely with your context engine.
- RAG Host – URL for the server powering the RAG service.
- Index – Optimized data structure for efficient search and retrieval.
- Namespace – Logical grouping to avoid identifier conflicts.
- Top K – Controls how many of the top results to fetch from the index.