Atla Selene Mini: A General Purpose Evaluation Model Paper • 2501.17195 • Published Jan 27, 2025 • 35
Counsel: A Meta-Evaluation Dataset for Agentic Tasks Paper • 2606.21627 • Published 9 days ago • 6
Counsel: A Meta-Evaluation Dataset for Agentic Tasks Paper • 2606.21627 • Published 9 days ago • 6