Datasets
Uploading Dataset
The Collinear AI Platform allows you to upload datasets in JSON or CSV format.
To assess or improve your models, your dataset needs to follow a specific structure. We support two input formats: JSON and CSV, with optional fields for ground_truth
and context
.
✅ Supported Formats
1. JSON Format
Each entry should be a dictionary with the following keys:
conv_prefix
— A list of message objects (withrole
andcontent
) representing the conversation history.response
— A dictionary with the assistant’s reply.ground_truth
(optional) — Integer rating (e.g., 1–5) for the response quality.context
(optional) — A string providing additional context.
Here is a sample JSON
2. CSV Format
Each row represents one example and should contain:
Column | Type | Description |
---|---|---|
conv_prefix | string | Raw string representing user message. |
response | string | Assistant’s reply (plain text). |
ground_truth | int (optional) | Integer rating (1–5) of how good the response is. |
context | string (optional) | Extra context string that may assist evaluation. |
Download a sample CSV file here.
Note :
-
conv_prefix should be a plain string version of the user message.
-
The CSV format is flattened — it does not store role metadata. It assumes all conv_prefix entries are from the user role.