Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.collinear.ai/llms.txt

Use this file to discover all available pages before exploring further.

For each task, Collinear’s Verifier Engine generates two complementary sets of verifiers that together cover the full evaluation surface:

Programmatic Verifiers

Programmatic verifiers inspect the playground state directly. They compare before/after snapshots of the playground to confirm the agent made the correct changes. Example: “Did the agent send an email to the correct recipient?” is answered by querying the email tool server’s state and reviewing the state diff. Programmatic verifiers are deterministic — given the same playground state, they always produce the same result. Use them for objective, checkable criteria.

Rubric-based Reward Models

Rubric-based verifiers use reward models to evaluate the agent’s actions against a scoring rubric. The judge reviews the agent’s full trace and assigns a reward score. Example: “Did the agent communicate professionally?” is evaluated by an LLM reviewing the conversation against a rubric defining professional communication. Rubric-based verifiers are useful for:
  • Subjective quality criteria (tone, clarity, helpfulness)
  • Multi-step reasoning evaluation
  • Cases where the “correct” answer depends on judgment

How Verifiers Produce Rewards

Both verifier types produce structured results:
  • Pass/fail — Did the agent complete the task successfully?
  • Reward signal — A numeric score (typically 0.0 to 1.0) indicating quality of completion.
  • Metadata — Verifier-specific details (which checks passed, which failed, and why).