1/ Evaluating a single agent harness is hard. Evaluating a multi-agent system? Whole different problem.
Most eval tools treat the model as the unit of analysis. In multi-agent systems, the system is what matters.
That's why we built MASEval 🧵
#AI #Agents #Eval #MultiAgentSystem #LLM