What is unique about the Collinear Flex Judge?
A Collinear Flex Assessment lets you create a customizable likert judge that mimics your own subject-matter experts and how they assess AI responses. It’s ideal for:- Testing model accuracy across custom metrics
- Generating insights quickly without heavy setup
Interactive Walkthrough
Explore the embedded demo below to see a Collinear Flex run in action. This interactive guide will walk you through setting up a run, choosing your data, and customizing your Judge.Key Features
- No-code setup for quick dataset evaluation
- Rich visual analytics for in-depth insights
How to Leverage a Collinear Flex Run with Annotations
Follow these steps to create a new run using Collinear Flex using annotated ground truth in an existing run:- Open an Existing Run Start by opening any Assess run previously created.
- Annotate the Data Use the Revise Score feature to provide annotated ground truth on at least 5 rows. The more rows annotated, the better the Judge will align to your criteria.
- Create a New Dataset for the New Judge Export the rows with ground truth and combine with your data to be evaluated.
- Start a New Assessment Start a new performance assessment, upload the revised dataset with ground truth, and select Collinear Flex as the Judge.
- Customize the Scoring Criteria A new set of scoring criteria will be generated automatically. You can tweak or regenerate these as needed.
- Finalize the Judge Click Create Assessment again to finalize and create the run.