๐Ÿ“Š What is Flex Evaluation?

Flex Evaluation lets you run create a customizable likert judge and evaluate on your judge. Itโ€™s ideal for:

  • Testing model accuracy across custom metrics
  • Generating insights quickly without heavy setup

๐Ÿš€ Interactive Walkthrough

Explore the embedded demo below to see Flex Evaluation in action. This interactive guide will walk you through setting up a run, choosing your data, and interpreting results.

๐Ÿงฐ Key Features

  • No-code setup for quick dataset evaluation
  • Rich visual analytics for in-depth insights

๐Ÿ› ๏ธ How to Create a Flex Evaluation with Annotations

Follow these steps to create a new Flex Judge using annotations from an existing run:

  1. Open an Existing Run Start by opening a run that was created using Collinear Flex.

  2. Annotate the Data Use the Feedback feature to revise scores and provide annotations on specific rows.

  3. Select Rows for the New Judge Choose the rows you want to include in your new evaluation.

  4. Click โ€œCreate a Judgeโ€ Once youโ€™ve selected the rows, click the Create a Judge button.

  1. Customize the Scoring Criteria A new set of scoring criteria will be generated automatically. You can tweak or regenerate these as needed.

  2. Finalize the Judge Click Create Judge again to finalize and save it. This new judge will be available for use in future runs.

๐Ÿ“˜ Additional Resources

Here is a sample dataset that you can use to test the Flex Evaluation feature: Sample Dataset.