UrduMMLU: A Massive Multitask Benchmark for Urdu Language Understanding Paper • 2606.07167 • Published about 1 month ago • 1
TABVERSE: Benchmarking Cross-Format Table Understanding in LLMs and VLMs Paper • 2606.09578 • Published 27 days ago
Cultural Benchmarking of LLMs in Standard and Dialectal Arabic Dialogues Paper • 2605.00119 • Published Apr 30
SAHM: A Benchmark for Arabic Financial and Shari'ah-Compliant Reasoning Paper • 2604.19098 • Published Apr 30
NeuralNexus at BEA 2025 Shared Task: Retrieval-Augmented Prompting for Mistake Identification in AI Tutors Paper • 2506.10627 • Published Jun 12, 2025
A Parallel Cross-Lingual Benchmark for Multimodal Idiomaticity Understanding Paper • 2601.08645 • Published Feb 24
iBitter-Stack: A Multi-Representation Ensemble Learning Model for Accurate Bitter Peptide Identification Paper • 2505.15730 • Published May 21, 2025
UrduFactCheck: An Agentic Fact-Checking Framework for Urdu with Evidence Boosting and Benchmarking Paper • 2505.15063 • Published May 21, 2025