TauTrait

TauTrait is a benchmark for evaluating large language models (LLMs) with realistic, persona-aware simulations. It builds on Tau-Bench from Sierra but introduces two key modifications:

TraitBasis-generated personas – more accurate and interpretable user simulations.
Domain-specific evaluation – tasks drawn from retail, airline, telecom, and telehealth settings.

TauTrait is designed to test model robustness, personalization, and fairness in high-impact, customer-facing domains where user traits strongly influence interaction quality.

✨ Features

Persona Simulation with TraitBasis Generate diverse, coherent user personas with different traits.
Domain Coverage TauTrait includes evaluation tasks in four industries:
- 🛒 Retail
- ✈️ Airline
- 📱 Telecom
- 🩺 Telehealth

🚀 Getting Started

Installation

pip install tau-trait

Usage

import argparse
from tau_trait.types import RunConfig
from tau_trait.run import run
from litellm import provider_list
from tau_trait.envs.user import UserStrategy

from tau_trait.types import RunConfig
from tau_trait.run import run

config = RunConfig(
    model_provider="openai",
    user_model_provider="traitmix",
    model=CLIENT_ASSISTANT_MODEL_NAME,
    user_model="", # traitmix api abstracts the model
    num_trials=1,
    env="retail",
    agent_strategy="tool-calling",
    temperature=0.7,
    task_split="test",
    start_index=0,
    end_index=-1,
    task_ids=[4],
    log_dir="results",
    max_concurrency=1,
    seed=10,
    shuffle=0,
    user_strategy="llm",
    few_shot_displays_path=None,
    trait_dict={"impatience": 1, "confusion": 0, "skeptical": 0, "incoherence": 0},
)

Each rollout writes a checkpoint file under results/ in the format agent_strategy-model-temperature_range_start-end_user-user_strategy_traits-<traits>_<timestamp>.json. The JSON captures the reward, transcript, and debug info for every task. Some definitions of the settings are below.

TauTrait Config Settings

General

--num-trials (int, default: 1)
Number of independent trials to run.
--seed (int, default: 10)
Random seed for reproducibility.
--shuffle (int, default: 0)
Whether to shuffle task order (0 = no, 1 = yes).
--log-dir (str, default: results)
Directory where logs and results are stored.

Environment & Tasks

--env (str, choices: retail, airline, default: retail)
Domain environment in which to run simulations.
--task-split (str, choices: train, test, dev, default: test)
Dataset split of tasks to run (applies only to the retail domain currently).
--start-index (int, default: 0)
Index of the first task to run.
--end-index (int, default: -1)
Index of the last task to run. Use -1 to run all remaining tasks.
--task-ids (list of int, optional)
Explicit list of task IDs to run (overrides index ranges).

Agent Configuration

--model (str, required)
The model to use for the agent.
--model-provider (str, choices from provider_list)
Provider for the agent’s model.
--agent-strategy (str, choices: tool-calling, act, react, few-shot, default: tool-calling)
Strategy used by the agent to interact with the environment.
- tool-calling: Invoke external tools.
- act: Pure action selection.
- react: Reason + act alternation.
- few-shot: Use few-shot exemplars.
--temperature (float, default: 0.0)
Sampling temperature for the action model (higher = more randomness).
--few-shot-displays-path (str, optional)
Path to a JSONL file containing few-shot demonstration examples.

User Simulator Configuration

--user-model (str, default: gpt-4o)
Model to use for the user simulator.
--user-model-provider (str, optional)
Provider for the user simulator’s model.
--user-strategy (str, choices from UserStrategy, default: llm)
Strategy for the simulated user (e.g., LLM-based).

Execution Controls

--max-concurrency (int, default: 1)
Number of tasks to run in parallel.

Introduction

Get started

Core concepts

Improve

✨ Features

🚀 Getting Started

Installation

Usage

TauTrait Config Settings

General

Environment & Tasks

Agent Configuration

User Simulator Configuration

Execution Controls

Introduction

Get started

Core concepts

Improve

​✨ Features

​🚀 Getting Started

​Installation

​Usage

​TauTrait Config Settings

​General

​Environment & Tasks

​Agent Configuration

​User Simulator Configuration

​Execution Controls

✨ Features

🚀 Getting Started

Installation

Usage

TauTrait Config Settings

General

Environment & Tasks

Agent Configuration

User Simulator Configuration

Execution Controls