Getting Started - Collinear AI

Install the CLI and run your first evaluation against a simulated environment.

Prerequisites

Python 3.13
A Collinear API key from platform.collinear.ai (Developers → API Keys)
An API key for any LiteLLM-supported model provider (OpenAI, Anthropic, Google, etc.)
One of the following for running environments:
- A Daytona API key — for fast, ephemeral remote sandboxes (recommended)
- Docker Desktop (or Docker Engine with Compose) — for local execution

Installation

uv tool install --python 3.13 "simulationlab[daytona]"

The PyPI package is named simulationlab. The installed CLI command is simlab.

Authentication

simlab auth login

This saves your key to ~/.config/simlab/config.toml. Then export your model provider key:

# Use whichever provider you prefer — SimLab uses LiteLLM under the hood.
export SIMLAB_AGENT_API_KEY="your-api-key"

# Optional: export Daytona key if using remote sandboxes
export DAYTONA_API_KEY="dtn_..."

Supported providers

SimLab supports any LiteLLM-compatible provider. Here are common examples:

Provider	Model format	`SIMLAB_AGENT_API_KEY`	Verifier `provider` value
OpenAI	`gpt-4o`	Your OpenAI API key	`openai`
Anthropic	`anthropic/claude-sonnet-4-20250514`	Your Anthropic API key	`anthropic`
Google	`gemini/gemini-2.5-pro`	Your Google AI API key	`gemini`

The model format follows LiteLLM conventions: <provider>/<model_name>. OpenAI models don’t require the provider prefix since it’s the default. Full example using Anthropic:

export SIMLAB_AGENT_API_KEY="sk-ant-..."

simlab tasks run --env my-env \
  --task hr__0_weaver_flag_biased_compensation_adjustment_request \
  --agent-model anthropic/claude-sonnet-4-20250514 \
  --agent-api-key "$SIMLAB_AGENT_API_KEY"

Starting an environment

Initialize an environment from a template and start it:

# Initialize an HR-based scenario environment
simlab env init my-env --template hr

To see all available templates: simlab templates list

Choosing a task

Tasks are organized by the scenario template associated with your environment.

# List tasks for your environment's template
simlab tasks list --env my-env

If you generated tasks locally (via tasks-gen), browse them directly:

simlab tasks list --tasks-dir ./generated-tasks

Running a rollout

The primary command is simlab tasks run. It automatically starts the environment, seeds data, runs the agent, verifies the result, and tears down when done. With Daytona (recommended — fast, ephemeral remote sandboxes):

simlab tasks run --env my-env \
  --task hr__0_weaver_flag_biased_compensation_adjustment_request \
  --daytona \
  --agent-model <model> \
  --agent-api-key "$SIMLAB_AGENT_API_KEY"

Without Daytona (runs locally via Docker — first run may be slow while images pull):

simlab tasks run --env my-env \
  --task hr__0_weaver_flag_biased_compensation_adjustment_request \
  --agent-model <model> \
  --agent-api-key "$SIMLAB_AGENT_API_KEY"

Use any LiteLLM-supported model for --agent-model (e.g. gpt-4o, anthropic/claude-sonnet-4-20250514, gemini/gemini-2.5-pro). You can also run tasks with your own agent implementation instead of the built-in one. See Bring Your Own Agent for the full interface and setup.

Viewing results

Results are saved to output/agent_run_<task_id>_<timestamp>/:

artifacts.json — full rollout trace (messages, tool calls, observations)
verifier/reward.txt — 1 (pass) or 0 (fail)
verifier/reward.json — e.g. {"reward": 1.0}

For more detail, see Understanding Results.

Configuring verifiers

Generated tasks use rubric-based verifiers that need a model to score results. Configure the verifier before running generated tasks:

export SIMLAB_VERIFIER_MODEL="<provider>/<model>"    # e.g. gpt-4o, anthropic/claude-sonnet-4-20250514
export SIMLAB_VERIFIER_PROVIDER="<provider>"          # e.g. openai, anthropic, gemini
export SIMLAB_VERIFIER_API_KEY="your-api-key"

Or in config.toml:

[verifier]
model = "<provider>/<model>"
provider = "<provider>"
api_key = "your-api-key"

Built-in tasks use programmatic verifiers and don’t require this setup. This is only needed for tasks you generate via tasks-gen.

​Prerequisites

​Installation

​Authentication

​Supported providers

​Starting an environment

​Choosing a task

​Running a rollout

​Viewing results

​Configuring verifiers