MixtureVitae: Open Web-Scale Pretraining Dataset With High Quality Instruction and Reasoning Data Built from Permissive-First Text Sources Paper • 2509.25531 • Published Sep 29, 2025 • 9
HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs Paper • 2503.12440 • Published Mar 16, 2025
HKCanto-Eval: A Benchmark for Evaluating Cantonese Language Understanding and Cultural Comprehension in LLMs Paper • 2503.12440 • Published Mar 16, 2025