SafeAuto: Knowledge-Enhanced Safe Autonomous Driving with Multimodal Foundation Models Paper • 2503.00211 • Published Feb 28, 2025
Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions Paper • 2606.03318 • Published 29 days ago
RUT-Bench Collection Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions". • 2 items • Updated 27 days ago
RUT-Bench Collection Benchmark data in "Beyond Ideal Instruction: A Comprehensive Framework for Evaluating LLMs in Realistic Interactions". • 2 items • Updated 27 days ago
STT-Arena Collection benchmark data, training data, and STT-Agent from our paper "STT-Arena: A More Realistic Environment for Tool-Using with Spatio-Temporal Dynamics" • 4 items • Updated May 19 • 1