Skip to main content
After an agent run, verifiers evaluate whether the task was completed successfully. Verifiers consume RunArtifacts and return a VerifierResult.

VerifierResult

class VerifierResult:
    success: bool     # Whether the task was completed successfully
    message: str      # Human-readable explanation
    output: str       # Additional detail (defaults to message if empty)
The result field on rollout responses contains the serialized verifier output:
{
  "success": true,
  "message": "All criteria met"
}

Rubric Judge

For rubric-based evaluation, the RubricJudgeResult provides structured scoring:
FieldTypeDescription
scorefloatScore from 0.0 to 1.0
verdictstring"PASS" or "FAIL"
confidencefloatConfidence in the verdict (0.0–1.0)
evidencelist[string]Bullet points with concrete evidence
failed_criterialist[string]Unmet rubric criteria
dimension_scoreslist[object]Per-dimension breakdowns: { "dimension": str, "score": float, "reason": str }
errorstringError message if evaluation failed