πŸŽ₯ Interactive Walkthrough

Explore the Safety Dashboard in action with an interactive demo:


πŸ“Š Key Metrics Overview

Track the most important indicators of model behavior in real time:

  1. Total Volume Total number of queries processed by real-time judges.

  2. Prompt Safety Violation Rate Percentage of unsafe or harmful prompts submitted by users.

  3. Response Safety Violation Rate Percentage of model responses that violate safety guidelines.

  4. False Refusal Rate Percentage of safe user prompts that the AI incorrectly refuses to answer.


πŸ“Œ Critical Summaries

Gain high-level visibility into the system’s behavior:

  • Total Queries vs. Flagged Queries A time-series graph comparing overall query volume to those flagged by the system.

  • Risk Levels Breakdown of queries by assessed risk level.

  • Flagged Categories Categorization of flagged queries to identify common safety issues.


πŸ” Live Data

Dive into raw safety data with the live table, including:

  • ID β€” Unique identifier for each query.
  • Conversation Prefix β€” The initial part of the user input.
  • Model Response β€” The AI’s reply to the prompt.
  • Judge Score β€” Evaluation score assigned by the judge model.
  • Categories β€” Tags describing the nature of the issue, if any.
  • Actions β€” What actions were taken in response (e.g., flagged, passed, needs review).