CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models Paper • 2605.08735 • Published 5 days ago • 59
ReviewScore: Misinformed Peer Review Detection with Large Language Models Paper • 2509.21679 • Published Sep 25, 2025 • 64
VisAlign: Dataset for Measuring the Degree of Alignment between AI and Humans in Visual Perception Paper • 2308.01525 • Published Aug 3, 2023
Uncertainty-Aware Text-to-Program for Question Answering on Structured Electronic Health Records Paper • 2203.06918 • Published Mar 14, 2022
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge Paper • 2402.13605 • Published Feb 21, 2024
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties Paper • 2505.20875 • Published May 27, 2025 • 3
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties Paper • 2505.20875 • Published May 27, 2025 • 3
Reasoning Model is Stubborn: Diagnosing Instruction Overriding in Reasoning Models Paper • 2505.17225 • Published May 22, 2025 • 64