Arabic Speech Datasets Collection Best Datasets for Arabic Speech Tasks • 16 items • Updated 25 days ago • 15
KITAB-Bench Collection A Comprehensive Multi-Domain Benchmark for Arabic OCR and Document Understanding • 24 items • Updated Feb 24, 2025 • 16
SARD: Synthetic Arabic Recognition Dataset Collection A large-scale synthetic Arabic OCR dataset comprising 843,622 book-style document images across 10 fonts, designed to advance VLM for Arabic Texts • 2 items • Updated May 19, 2025 • 7
YouTube-SL-25: A Large-Scale, Open-Domain Multilingual Sign Language Parallel Corpus Paper • 2407.11144 • Published Jul 15, 2024 • 10
BAREC Shared Task 2025 Collection Sentence-level and Document-level readability datasets for the BAREC Shared Task 2025 • 4 items • Updated Jul 21, 2025 • 2
BiMediX2 Collection BiMediX2 : Bio-Medical EXpert LMM for Diverse Medical Modalities • 7 items • Updated Oct 24, 2025 • 10
BiMediX2: Bio-Medical EXpert LMM for Diverse Medical Modalities Paper • 2412.07769 • Published Dec 10, 2024 • 30