Skip to main content
Simulation Lab includes a task generation pipeline that produces complete tasks from a configuration file. This is the programmatic path for creating evaluation content at scale.

Task Generation Config

Task generation is driven by a TOML config file that defines the agent role, available toolsets, scenario conventions, and generation parameters. Initialize one from a preset or create it from scratch:
# Initialize a config from a preset
simlab tasks-gen init --preset recruiting --output-dir ./taskgen

# Run generation
simlab tasks-gen run --config ./taskgen/config.toml
Here is an example config for an HR recruiting scenario:
preset = "recruiting"
categories = [
    { id = "job_requisition", label = "Job requisition" },
    { id = "shortlist_resumes", label = "Shortlist the resumes" },
    { id = "schedule_interviews", label = "Schedule the interviews" },
    { id = "consolidate_feedback", label = "Consolidate feedback" },
    { id = "send_offers_or_rejections", label = "Send out offers or rejections" },
    { id = "negotiate_and_final_offer", label = "Negotiate and send final offer" },
    { id = "handle_candidate_rejections", label = "Handle candidate rejections" },
    { id = "candidate_questions", label = "Handle questions from candidates" },
]

[agent]
role = "HR recruiting coordinator"
description = "Handles end-to-end recruiting workflows: scheduling interviews, managing candidate pipelines, coordinating offer discussions, and communicating with hiring managers and candidates."

[[toolset]]
name = "HRIS"
description = "Query/update employee records, job requisitions, candidate profiles"
operations = ["search", "read", "create", "update"]

[[toolset]]
name = "Email"
description = "Send and read emails"
operations = ["send", "read"]

[[toolset]]
name = "Calendar"
description = "View and manage calendar events"
operations = ["list", "create", "update", "delete"]

[[toolset]]
name = "Chat"
description = "Send messages in Rocket.Chat channels and DMs"
operations = ["send", "read"]

[scenario]
name = "recruiting"
role_label = "HR recruiting professional"
conventions = """
- Always check all participants' calendars before scheduling
- Never share compensation details in group channels
- Document all candidate interactions in HRIS
- Get manager approval before extending offers
"""
policies = [
    "Interviews must include at least one diverse panel member",
    "Offers require VP approval for >$200k total comp",
    "Candidate data must not be shared outside recruiting team",
]

[generation]
num_tasks = 2
deduplicate = false
filter = true

[generation.complexity]
easy = 0.3
medium = 0.5
hard = 0.2

[generation.diversity]
variations = [
    "straightforward",
    "scheduling_conflict",
    "candidate_experience",
    "handoff_or_reporting",
    "offer_negotiation",
    "candidate_decline",
    "accommodation",
]
  • preset — Starting template. Available presets: recruiting, people_mgmt, coding, customer_support.
  • categories — Task categories to generate across.
  • [agent] — The role and description of the agent being evaluated.
  • [[toolset]] — Tools available to the agent, with allowed operations.
  • [scenario] — Domain conventions, policies, and constraints the agent should follow.
  • [generation] — Number of tasks, complexity distribution, and diversity dimensions.
The config can also define [[workflows]] (multi-step procedures) and [[npcs]] (non-player characters like hiring managers or candidates) to increase task realism.

Pre-Built Tasks

For common domains, pre-built tasks are available via the Scenario Manager API. Tasks are associated with templates — when you create an environment from a template, its tasks are automatically available:
simlab tasks list --env my-env
simlab tasks run --env my-env --task 100_weaver_schedule_phone_screen \
  --agent-model gpt-5.2--agent-api-key "$OPENAI_API_KEY"