RUT-Bench - a Miaow-Lab Collection

Miaow-Lab 's Collections

RUT-Bench

updated about 3 hours ago

Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions".

Miaow-Lab/RUT-Bench

Viewer • Updated about 3 hours ago • 1.64k • 13
Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions

Paper • 2606.03318 • Published 2 days ago