What the Assessor Does
Client.assess
runs a rubric-guided evaluation locally by calling an OpenAI-compatible model that you supply. It returns an AssessmentResponse
with:
message
: general status text (e.g., “ok”).evaluation_result
: list of maps containingscore
andrationale
per conversation.
Supplying Judge Credentials
By default, the assessor reuses the assistant credentials you passed toClient
. Override any part of the judge connection as needed:
simulations
is a non-empty list of SimulationResult
objects returned from client.simulate
. Alternatively you can provide your own dataset, granted it is in the format the assess
method expects.
Interpreting Results
Handling Errors
- Missing credentials raise
SystemExit
in the example helpers; validate env vars up front in production. - A
ValueError
is thrown if you pass an empty dataset. - Judge timeouts or rate limits surface from the underlying HTTP client; adjust
timeout
,max_retries
, or run with smaller batches.