Multilinguality at the Edge: Developing Language Models for the Global South Paper • 2604.21637 • Published 17 days ago
Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation Paper • 2604.11290 • Published 27 days ago • 2
Polyglot Teachers: Evaluating Language Models for Multilingual Synthetic Data Generation Paper • 2604.11290 • Published 27 days ago • 2
FilBench Eval Collection FilBench-Eval is an Open LLM Evaluation Suite for Philippine Languages. The eval runner is integrated with HuggingFace's lighteval. • 5 items • Updated Jan 14 • 1
FilBench: Can LLMs Understand and Generate Filipino? Paper • 2508.03523 • Published Aug 5, 2025 • 1
FilBench Eval Collection FilBench-Eval is an Open LLM Evaluation Suite for Philippine Languages. The eval runner is integrated with HuggingFace's lighteval. • 5 items • Updated Jan 14 • 1
Datasheets Aren't Enough: DataRubrics for Automated Quality Metrics and Accountability Paper • 2506.01789 • Published Jun 2, 2025 • 15
CaMMT: Benchmarking Culturally Aware Multimodal Machine Translation Paper • 2505.24456 • Published May 30, 2025
Universal Dependencies for Tagalog Collection Models and dependency parsers for Tagalog using the UD_NewsCrawl dataset • 8 items • Updated May 29, 2025
MMTEB: Massive Multilingual Text Embedding Benchmark Paper • 2502.13595 • Published Feb 19, 2025 • 48
Crowdsource, Crawl, or Generate? Creating SEA-VL, a Multicultural Vision-Language Dataset for Southeast Asia Paper • 2503.07920 • Published Mar 10, 2025 • 101
Establishing Baselines for Text Classification in Low-Resource Languages Paper • 2005.02068 • Published May 5, 2020