🛡️ What is a Safety Evaluation?

A Safety Evaluation measures how well your model adheres to safety guidelines when generating responses. Collinear AI uses a selected safety judge to assess risks like harmful, biased, or inappropriate content.

This helps you:

  • Detect and categorize unsafe outputs
  • Benchmark model behavior against safety standards
  • Ensure alignment with responsible AI practices

🎥 Interactive Walkthrough

Want to see it in action? Follow this guided demo to create your safety run:

🧑‍⚖️ Judge Types

Choose the appropriate safety judge based on your evaluation needs:

1. CollinearGuard (Rating)

Collinear AI’s proprietary Likert-based model using a 1–5 rating scale.

  • Use a 5-row scoring table to define your evaluation criteria.
  • Each row corresponds to a score from 1 (lowest) to 5 (highest).

2. CollinearGuard Nano

Binary classification model that evaluates specific safety dimensions.

  • Evaluation Targets:
  • Prompt: Evaluates the user’s input
  • Response: Evaluates the model’s output
  • Refusal: Evaluates if the model should or did refuse to respond

3. CollinearGuard Nano (Categories)

Enhanced version of Nano that also outputs safety categories alongside binary results.

  • Ideal for more detailed classification use cases.

4. Llama Guard 3

Meta’s off-the-shelf safety model.

  • Plug-and-play judge with no customization needed.
  • Great for quick or comparative benchmarks.

5. Prompted Model

Use any model with a custom prompt template.

  • Integrate your own model

✅ Next Steps

Once you’ve selected a judge, you’ll be guided to Run the evaluation and view results

Need help picking the right judge? Reach out to support