What is unique about the Collinear Flex Judge?

A Collinear Flex Assessment lets you create a customizable likert judge that mimics your own subject-matter experts and how they assess AI responses. It’s ideal for:
  • Testing model accuracy across custom metrics
  • Generating insights quickly without heavy setup

Interactive Walkthrough

Explore the embedded demo below to see a Collinear Flex run in action. This interactive guide will walk you through setting up a run, choosing your data, and customizing your Judge.

Key Features

  • No-code setup for quick dataset evaluation
  • Rich visual analytics for in-depth insights

How to Leverage a Collinear Flex Run with Annotations

Follow these steps to create a new run using Collinear Flex using annotated ground truth in an existing run:
  1. Open an Existing Run Start by opening any Assess run previously created.
  2. Annotate the Data Use the Revise Score feature to provide annotated ground truth on at least 5 rows. The more rows annotated, the better the Judge will align to your criteria.
  3. Create a New Dataset for the New Judge Export the rows with ground truth and combine with your data to be evaluated.
  4. Start a New Assessment Start a new performance assessment, upload the revised dataset with ground truth, and select Collinear Flex as the Judge.
  5. Customize the Scoring Criteria A new set of scoring criteria will be generated automatically. You can tweak or regenerate these as needed.
  6. Finalize the Judge Click Create Assessment again to finalize and create the run.

Additional Resources

Here is a sample dataset that you can use to test the Flex Evaluation feature: Sample Dataset.