Assess

What the Assessor Does

Client.assess runs a rubric-guided evaluation locally by calling an OpenAI-compatible model that you supply. It returns an AssessmentResponse with:

message: general status text (e.g., “ok”).
evaluation_result: list of maps containing score and rationale per conversation.

Supplying Judge Credentials

By default, the assessor reuses the assistant credentials you passed to Client. Override any part of the judge connection as needed:

result = client.assess(
    dataset=simulations,
    judge_model_url=os.getenv("JUDGE_BASE_URL", "https://api.openai.com/v1"),
    judge_model_api_key=OPENAI_API_KEY,
    judge_model_name=os.getenv("JUDGE_MODEL", "gpt-4o-mini"),
    temperature=0.0,
    max_tokens=512,
)

Ensure simulations is a non-empty list of SimulationResult objects returned from client.simulate. Alternatively you can provide your own dataset, granted it is in the format the assess method expects.

Interpreting Results

for idx, row in enumerate(result.evaluation_result, start=1):
    scores = next(iter(row.values()), None)
    if scores:
        print(f"Conversation {idx}: score={scores.score}\n  {scores.rationale}")

Treat the rationale as model-generated text. Post-process or threshold scores to fit your use case.

Handling Errors

Missing credentials raise SystemExit in the example helpers; validate env vars up front in production.
A ValueError is thrown if you pass an empty dataset.
Judge timeouts or rate limits surface from the underlying HTTP client; adjust timeout, max_retries, or run with smaller batches.

Moving into the Dashboard

If you plan to publish assessed results into the Collinear platform, export the dataset (for example as JSONL) and use the upload endpoints covered in the API docs. Keep the SDK and dashboard credentials separate to avoid accidental cross-use.

Introduction

Simulation Lab

Core Concepts

What the Assessor Does

Supplying Judge Credentials

Interpreting Results

Handling Errors

Moving into the Dashboard

Introduction

Simulation Lab

Core Concepts

​What the Assessor Does

​Supplying Judge Credentials

​Interpreting Results

​Handling Errors

​Moving into the Dashboard

What the Assessor Does

Supplying Judge Credentials

Interpreting Results

Handling Errors

Moving into the Dashboard