- TraitBasis-generated personas – more accurate and interpretable user simulations.
- Domain-specific evaluation – tasks drawn from retail, airline, telecom, and telehealth settings.
✨ Features
- Persona Simulation with TraitBasis Generate diverse, coherent user personas with different traits.
-
Domain Coverage
TauTrait includes evaluation tasks in four industries:
- 🛒 Retail
- ✈️ Airline
- 📱 Telecom
- 🩺 Telehealth
🚀 Getting Started
Installation
Usage
results/
in the format agent_strategy-model-temperature_range_start-end_user-user_strategy_traits-<traits>_<timestamp>.json
. The JSON captures the reward, transcript, and debug info for every task.
Some definitions of the settings are below.
TauTrait Config Settings
General
-
--num-trials
(int, default: 1)
Number of independent trials to run. -
--seed
(int, default: 10)
Random seed for reproducibility. -
--shuffle
(int, default: 0)
Whether to shuffle task order (0 = no, 1 = yes). -
--log-dir
(str, default:results
)
Directory where logs and results are stored.
Environment & Tasks
-
--env
(str, choices:retail
,airline
, default:retail
)
Domain environment in which to run simulations. -
--task-split
(str, choices:train
,test
,dev
, default:test
)
Dataset split of tasks to run (applies only to the retail domain currently). -
--start-index
(int, default: 0)
Index of the first task to run. -
--end-index
(int, default: -1)
Index of the last task to run. Use-1
to run all remaining tasks. -
--task-ids
(list of int, optional)
Explicit list of task IDs to run (overrides index ranges).
Agent Configuration
-
--model
(str, required)
The model to use for the agent. -
--model-provider
(str, choices fromprovider_list
)
Provider for the agent’s model. -
--agent-strategy
(str, choices:tool-calling
,act
,react
,few-shot
, default:tool-calling
)
Strategy used by the agent to interact with the environment.tool-calling
: Invoke external tools.act
: Pure action selection.react
: Reason + act alternation.few-shot
: Use few-shot exemplars.
-
--temperature
(float, default: 0.0)
Sampling temperature for the action model (higher = more randomness). -
--few-shot-displays-path
(str, optional)
Path to a JSONL file containing few-shot demonstration examples.
User Simulator Configuration
-
--user-model
(str, default:gpt-4o
)
Model to use for the user simulator. -
--user-model-provider
(str, optional)
Provider for the user simulator’s model. -
--user-strategy
(str, choices fromUserStrategy
, default:llm
)
Strategy for the simulated user (e.g., LLM-based).
Execution Controls
--max-concurrency
(int, default: 1)
Number of tasks to run in parallel.