MedInsightBench: Evaluating Medical Analytics Agents Through Multi-Step Insight Discovery in Multimodal Medical Data Paper • 2512.13297 • Published Dec 15, 2025
InsightEval: An Expert-Curated Benchmark for Assessing Insight Discovery in LLM-Driven Data Agents Paper • 2511.22884 • Published Nov 28, 2025
Measuring Hong Kong Massive Multi-Task Language Understanding Paper • 2505.02177 • Published May 4, 2025 • 1