AI 脳 ai-know.
JA · EN
CONCEPT · STUB

Agent Evaluation(Agent Evaluation)

Agent Evaluation encompasses the benchmarks, harnesses, and infrastructure used to assess AI agents that perform multi-step tasks, use tools, and make autonomous decisions. The Holistic Agent Leaderboard (HAL), accepted at ICLR 2026, established cost-aware evaluation as the emerging standard for agentic benchmarking.

※ Auto-generated stub — requires completion

Mentioned in