Aya 23: Open Weight Releases to Further Multilingual Progress Paper • 2405.15032 • Published May 23, 2024 • 34
Cohere Labs Aya Expanse Collection Aya Expanse is an open-weight research release of a model with highly advanced multilingual capabilities. • 4 items • Updated Jul 31, 2025 • 46
Aya Expanse: Combining Research Breakthroughs for a New Multilingual Frontier Paper • 2412.04261 • Published Dec 5, 2024 • 8
Cohere Labs Aya 23 Collection Aya 23 is an open weights research release of an instruction fine-tuned model with highly advanced multilingual capabilities. • 3 items • Updated Jul 31, 2025 • 58
Teaching Models to Understand (but not Generate) High-risk Data Paper • 2505.03052 • Published May 5, 2025 • 6
view article Article GaLore: Advancing Large Model Training on Consumer-grade Hardware +7 Titus-von-Koeller, jiaweizhao, mdouglas, hiyouga, ybelkada, muellerzr, amyeroberts, smangrul, BenjaminB • Mar 20, 2024 • 32
OmniDocBench: Benchmarking Diverse PDF Document Parsing with Comprehensive Annotations Paper • 2412.07626 • Published Dec 10, 2024 • 30
view article Article Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL danielhanchen • Jan 10, 2024 • 77
RegMix: Data Mixture as Regression for Language Model Pre-training Paper • 2407.01492 • Published Jul 1, 2024 • 41
view article Article Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM +2 ariG23498, merve, pcuenq, reach-vb • Mar 12, 2025 • 497
view article Article Learn the Hugging Face Kernel Hub in 5 Minutes +5 drbh, danieldk, Narsil, pcuenq, pagezyhf, merve, reach-vb • Jun 12, 2025 • 163
view article Article From PyTorch DDP to Accelerate to Trainer, mastery of distributed training with ease muellerzr • Oct 21, 2022 • 44
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5, 2025 • 61
view article Article How to train a new language model from scratch using Transformers and Tokenizers julien-c • Feb 14, 2020 • 61
view article Article Efficient LLM Pretraining: Packed Sequences and Masked Attention sirluk • Oct 7, 2024 • 71