Add A Safety Judge
Use the Collinear AI Platform to create a safety judge for your space.
Safety Judges help keep language models safe by filtering out harmful content. They ensure that responses are:
- legal
- respectful
- safe for users
Steps to Create a Safety Judge
Select Safety Judge
Select Safety Model Type
You can choose between:
- Collinear Guard
- Collinear Guard Nano
- Llama Guard
- Wild Guard
- Prompted Model
1. Creating a Collinear Guard judge
This judge evaluates the safety of model outputs on a more granular scale, providing a detailed assessment of the content’s safety level. The Likert scale enables a nuanced view, from identifying highly unsafe outputs to confirming very safe responses.
Safety Rating: Likert scale rating from 1 to 5
1
: Very unsafe2
: Unsafe3
: Neutral4
: Safe5
: Very safe
Once you select Collinear Guard judge, select “Continue”.
Set Judge Name
Name it according to your preference and select “Create Judge”.
2. Creating a Collinear Guard Nano judge
The Collinear guard nano model supports three types of evaluations:
Prompt Evaluation: Binary classification
0
: The prompt is deemed unsafe.1
: The prompt is considered safe.
Response Evaluation: Binary classification
0
: The response is deemed unsafe.1
: The response is considered safe.
Refusal Evaluation: Binary classification
0
: Indicates the model refused to generate a response.1
: Indicates the model successfully generated a response.
Once you select Collinear Guard Nano judge, select “Continue”.
Select your Evaluation Type
-
Reponse evaluation
-
Prompt evaluation
-
Refusal evaluation
and then click “Continue”.
Set Judge Name
Name it according to your preference and select “Create Judge”.
3. Creating a LLama Guard judge
This LLama Guard judge provides a simple and direct safety assessment, ensuring that unsafe content is flagged and only safe content passes through.
LLamaGuard Evaluation: Binary classification
0
: The content is deemed unsafe.1
: The content is considered safe.
Once you select LLama Guard judge, select “Continue”.
Set Judge Name
Name it according to your preference and select “Create Judge”.
4. Creating a Wild Guard judge
The Wild Guard judge provides a straightforward safety evaluation for prompts and responses, along with refusal handling, ensuring that unsafe interactions are flagged and refusals are properly identified.
Prompt Evaluation: Binary classification
0
: The prompt is deemed unsafe.1
: The prompt is considered safe.
Response Evaluation: Binary classification
0
: The response is deemed unsafe.1
: The response is considered safe.
Refusal Evaluation: Binary classification
0
: Indicates the model refused to generate a response.1
: Indicates the model successfully generated a response.
Once you select Wild Guard judge, select “Continue”.
Set Judge Name
Name it according to your preference and select “Create Judge”.
5. Creating a Prompted Model judge
This safety judge will evaluate model outputs based on predefined safety criteria, ensuring that unsafe responses are flagged for further review, while safe outputs are approved for deployment.
Output: Binary classification
0
: Indicates the response is deemed unsafe.1
: Indicates the response is considered safe.
Once you select Prompted Model judge, select “Continue”.
Select your Prompted Model
You can select your model from the drop-down. If you haven’t added a model, select “Add New Model” to create a new one.
Edit your prompt template
You can proceed with the template or edit it and then select “Continue.”
Set Judge Name
Name it according to your preference and select “Create Judge.”