SHANKS: Simultaneous Hearing and Thinking for Spoken Language Models Paper • 2510.06917 • Published Oct 8, 2025 • 35
B-score: Detecting biases in large language models using response history Paper • 2505.18545 • Published May 24, 2025 • 30
Understanding Generative AI Capabilities in Everyday Image Editing Tasks Paper • 2505.16181 • Published May 22, 2025 • 24
VideoGameQA-Bench: Evaluating Vision-Language Models for Video Game Quality Assurance Paper • 2505.15952 • Published May 21, 2025 • 20