Safety Judges help keep language models safe by filtering out harmful content. They ensure that responses are:

legal
respectful
safe for users

Steps to Create a Safety Judge

Select Safety Judge

Select Safety Model Type

You can choose between:

Collinear Guard
Collinear Guard Nano
Llama Guard
Wild Guard
Prompted Model

1. Creating a Collinear Guard Judge v1.0

This judge evaluates the safety of model outputs on a more granular scale, providing a detailed assessment of the content’s safety level. The Likert scale enables a nuanced view, from identifying highly unsafe outputs to confirming very safe responses.

Safety Rating: Likert scale rating from 1 to 5

1: Very unsafe
2: Unsafe
3: Neutral
4: Safe
5: Very safe

Once you select Collinear Guard Judge v1.0, select “Continue”.

Set Judge Name

Name it according to your preference and select “Create Judge”.

2. Creating a Collinear Guard Nano judge

The Collinear guard nano model supports three types of evaluations:

Prompt Evaluation: Binary classification

0: The prompt is deemed unsafe.
1: The prompt is considered safe.

Response Evaluation: Binary classification

0: The response is deemed unsafe.
1: The response is considered safe.

Refusal Evaluation: Binary classification

0: Indicates the model refused to generate a response.
1: Indicates the model successfully generated a response.

Once you select Collinear Guard Nano judge, select “Continue”.

Select your Evaluation Type

Reponse evaluation
Prompt evaluation
Refusal evaluation

and then click “Continue”.

Set Judge Name

Name it according to your preference and select “Create Judge”.

3. Creating a LLama Guard judge

This LLama Guard judge provides a simple and direct safety assessment, ensuring that unsafe content is flagged and only safe content passes through.

LLamaGuard Evaluation: Binary classification

0: The content is deemed unsafe.
1: The content is considered safe.

Once you select LLama Guard judge, select “Continue”.

Set Judge Name

Name it according to your preference and select “Create Judge”.

4. Creating a Wild Guard judge

The Wild Guard judge provides a straightforward safety evaluation for prompts and responses, along with refusal handling, ensuring that unsafe interactions are flagged and refusals are properly identified.

Prompt Evaluation: Binary classification

0: The prompt is deemed unsafe.
1: The prompt is considered safe.

Response Evaluation: Binary classification

0: The response is deemed unsafe.
1: The response is considered safe.

Refusal Evaluation: Binary classification

0: Indicates the model refused to generate a response.
1: Indicates the model successfully generated a response.

Once you select Wild Guard judge, select “Continue”.

Set Judge Name

Name it according to your preference and select “Create Judge”.

5. Creating a Prompted Model judge

This safety judge will evaluate model outputs based on predefined safety criteria, ensuring that unsafe responses are flagged for further review, while safe outputs are approved for deployment.

Output: Binary classification

0: Indicates the response is deemed unsafe.
1: Indicates the response is considered safe.

Once you select Prompted Model judge, select “Continue”.

Select your Prompted Model

You can select your model from the drop-down. If you haven’t added a model, select “Add New Model” to create a new one.

Edit your prompt template

You can proceed with the template or edit it and then select “Continue.”

Set Judge Name

Name it according to your preference and select “Create Judge.”

Introduction

Get Started

Assess

Agentic AI

Guard

Improve

Judge

Datasets

Add A Safety Judge

Steps to Create a Safety Judge

Select Safety Judge

Select Safety Model Type

1. Creating a Collinear Guard Judge v1.0

Set Judge Name

2. Creating a Collinear Guard Nano judge

Select your Evaluation Type

Set Judge Name

3. Creating a LLama Guard judge

Set Judge Name

4. Creating a Wild Guard judge

Set Judge Name

5. Creating a Prompted Model judge

Select your Prompted Model

Edit your prompt template

Set Judge Name

Introduction

Get Started

Assess

Agentic AI

Guard

Improve

Judge

Datasets

​Steps to Create a Safety Judge

​Select Safety Judge

​Select Safety Model Type

​1. Creating a Collinear Guard Judge v1.0

​Set Judge Name

​2. Creating a Collinear Guard Nano judge

​Select your Evaluation Type

​Set Judge Name

​3. Creating a LLama Guard judge

​Set Judge Name

​4. Creating a Wild Guard judge

​Set Judge Name

​5. Creating a Prompted Model judge

​Select your Prompted Model

​Edit your prompt template

​Set Judge Name

Steps to Create a Safety Judge

Select Safety Judge

Select Safety Model Type

1. Creating a Collinear Guard Judge v1.0

Set Judge Name

2. Creating a Collinear Guard Nano judge

Select your Evaluation Type

Set Judge Name

3. Creating a LLama Guard judge

Set Judge Name

4. Creating a Wild Guard judge

Set Judge Name

5. Creating a Prompted Model judge

Select your Prompted Model

Edit your prompt template

Set Judge Name