Comprehensive evaluation of agent workflows across multiple dimensions.

Evaluation Metrics

MetricDescriptionScale
Goal CompletionDoes the agent achieve its purpose?0-1
Step EfficiencyOptimal path to solution1-5
Context RetentionMaintains conversation memory1-5
Error RateUnsuccessful steps%
User SatisfactionPredicted user experience1-5

Preset Metrics:

  • Action Completion: Did the agent achieve all user goals?
  • Action Advancement: Did it progress any goal?
  • Tool Selection Quality: Did the agent select the correct tool and parameters?
  • Tool Errors: Did the tool execution steps succeed?
  • Instruction Adherence: Did the LLM follow the given instructions?
  • Context Adherence: Is the response grounded in retrieved/expected context?

Assessment Process

  1. Select generated dataset
  2. Choose evaluation metrics
  3. Configure assessment parameters
  4. Run evaluation
  5. Review heatmap visualization

Interpreting Results

  • <0.5 : Critical issue needing immediate attention
  • 0.5-0.7 : Significant room for improvement
  • 0.7-0.9 : Good but can be optimized
  • >0.9 : Excellent performance