Dr.V: A Hierarchical Perception-Temporal-Cognition Framework to Diagnose Video Hallucination by Fine-grained Spatial-Temporal Grounding Paper • 2509.11866 • Published Sep 15, 2025 • 2
LMR-BENCH: Evaluating LLM Agent's Ability on Reproducing Language Modeling Research Paper • 2506.17335 • Published Jun 19, 2025
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published about 1 month ago • 55
A Unified Hallucination Mitigation Framework for Large Vision-Language Models Paper • 2409.16494 • Published Sep 24, 2024
Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation Paper • 2306.16650 • Published Jun 29, 2023
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published Sep 12, 2024 • 66
FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models Paper • 2311.01477 • Published Nov 2, 2023 • 1