To assess or improve your models, your dataset needs to follow a specific structure. We support two input formats: JSON and CSV, with field requirements varying by run type.

Supported Formats

1. JSON Format

Each entry should be a dictionary with the following keys:
  • conv_prefix — A list of message objects (with role and content) representing the conversation history. The last turn should be from the user. Required for all run types.
  • response — A dictionary with the assistant’s reply to the last user’s input message. Required for all run types.
  • ground_truth — Integer rating (e.g., 1–5) for the response quality. Required for Performance runs.
  • context — A string providing additional context. Required for Reliability runs.
Here is a sample JSON:
[
  {
    "conv_prefix": [{"role": "user", "content": "Hello"}],
    "response": {"role": "assistant", "content": "Hey there!"},
    "ground_truth": 4,
    "context": "Greeting conversation"
  },
  {
    "conv_prefix": [{"role": "user", "content": "Tell me a joke"}],
    "response": {"role": "assistant", "content": "Why did the chicken join a band? Because it had the drumsticks!"}
  }
]

2. CSV Format

Each row represents one example and should contain:
ColumnTypeDescription
conv_prefixstringRaw string representing user message.
responsestringAssistant’s reply (plain text).
ground_truthint (required for Performance runs only)Integer rating (1–5) of how good the response is.
contextstring (required for Reliability runs only)Extra context string that may assist evaluation.
Download a sample CSV file here. Note :
  • conv_prefix should be a plain string version of the user message.
  • The CSV format is flattened — it does not store role metadata. It assumes all conv_prefix entries are from the user role.