Agent Evaluation(Agent Evaluation)
Agent Evaluation encompasses the benchmarks, harnesses, and infrastructure used to assess AI agents that perform multi-step tasks, use tools, and make autonomous decisions. The Holistic Agent Leaderboard (HAL), accepted at ICLR 2026, established cost-aware evaluation as the emerging standard for agentic benchmarking.
※ Auto-generated stub — requires completion