Skip to main content
Base URL: https://rl-gym-api.collinear.ai

Launch Runs

Start a batch of rollouts (agent evaluation runs) for one or more tasks.
POST /runs
Request body:
{
  "scenario_id": "hr",
  "task_ids": ["hr__100_weaver_schedule_phone_screen"],
  "rollout_count": 3,
  "max_parallel": 2,
  "model_config": {
    "model": "gpt-4o",
    "provider": "openai",
    "api_key": "sk-...",
    "temperature": 0.7,
    "max_steps": 30
  }
}
FieldTypeRequiredDescription
scenario_idstringYesTemplate identifier (e.g. hr)
task_idslist[string]YesTask IDs to execute (min 1)
model_configModelConfigYesLLM configuration for the agent (see below)
rollout_countintNoRollouts per task (default: 1)
max_parallelintNoHint for desired parallel rollouts
run_namestringNoCustom run name/ID (must be unique)
user_model_configModelConfigNoOptional LLM config for simulated user persona
task_overridesdict[string, TaskOverride]NoPer-task overrides (keys must be in task_ids)

ModelConfig

FieldTypeDefaultDescription
modelstringLLM model identifier (e.g. gpt-4o, claude-sonnet-4-6)
providerstring"custom"Provider: custom, openai, anthropic, aws-bedrock, azure-openai, gcp
api_keystringAPI credential for the provider (required)
base_urlstringnullCustom API base URL (for proxies or self-hosted)
temperaturefloat0.7Sampling temperature (0.0–2.0)
max_stepsint30Maximum tool-calling turns per rollout (min: 1)
system_promptstringnullOverride system prompt injected before rollout
tool_server_urlstringnullOverride URL for the Tool Server
log_probboolfalseCapture token log-probabilities during rollout
summarize_every_n_turnsint0Summarize tool-call history every N turns (0 = disabled)
context_strideint3Recent exchanges kept verbatim when summarizing (min: 1)

TaskOverride

FieldTypeDescription
task_jsonobjectFull task JSON override (replaces entire on-disk task)
namestringDisplay name override
descriptionstringTask prompt override
rubric_markdownstringRubric markdown override for the universal verifier
client_contextdictOpaque caller metadata persisted in artifacts
Response: 200 OK
{
  "run_id": "run_abc123",
  "run_status": {
    "run_id": "run_abc123",
    "scenario_id": "hr",
    "source": "rl-gym",
    "total": 3,
    "completed": 0,
    "failed": 0,
    "in_progress": 0,
    "pending": 3,
    "running": 0,
    "rollout_ids": ["rol_001", "rol_002", "rol_003"]
  },
  "rollouts": [
    {
      "rollout_id": "rol_001",
      "scenario_id": "hr",
      "task_id": "hr__100_weaver_schedule_phone_screen",
      "run_id": "run_abc123",
      "status": "pending"
    }
  ]
}

List Runs

List runs with optional filtering and pagination.
GET /runs
Query parameters:
ParameterTypeDefaultDescription
scenario_idstringFilter by template
sourcestring"rl-gym"Filter by source: rl-gym, task-editor, datadog-synthetics, all
limitint50Results per page (1–200)
offsetint0Pagination offset
expandlist[string]Pass rollouts to populate rollout_ids
Response: 200 OK
{
  "total": 1,
  "runs": [
    {
      "run_id": "run_abc123",
      "scenario_id": "hr",
      "source": "rl-gym",
      "total": 3,
      "completed": 2,
      "failed": 0,
      "in_progress": 1,
      "pending": 0,
      "running": 1,
      "rollout_ids": []
    }
  ]
}

Get Run Status

Get aggregate status for a run, including per-rollout breakdowns.
GET /runs/{run_id}
Response: 200 OK — returns a RunStatusResponse (same schema as items in the list runs response). Errors: 404 if run not found.

Cancel Run

Signal all active rollouts in a run to cancel gracefully.
POST /runs/{run_id}/cancel
Response: 200 OK
{
  "run_id": "run_abc123",
  "cancelled_rollouts": 2
}

Delete Run

Delete a run and all associated rollout records.
DELETE /runs/{run_id}
Response: 200 OK
{
  "run_id": "run_abc123",
  "deleted_rollouts": 3
}
Errors: 404 if run not found.