Claw-Eval: Toward Trustworthy Evaluation of Autonomous Agents Paper • 2604.06132 • Published Apr 7 • 119
Quran-MD: A Fine-Grained Multilingual Multimodal Dataset of the Quran Paper • 2601.17880 • Published Jan 25 • 3
Building Trust in Clinical LLMs: Bias Analysis and Dataset Transparency Paper • 2510.18556 • Published Oct 21, 2025 • 2