view article Article Fine-tuning SmolLM with Group Relative Policy Optimization (GRPO) by following the Methodologies prithivMLmods • Feb 17, 2025 • 30
view article Article Open-R1: a fully open reproduction of DeepSeek-R1 +1 eliebak, lvwerra, lewtun • Jan 28, 2025 • 889
Aya Dataset: An Open-Access Collection for Multilingual Instruction Tuning Paper • 2402.06619 • Published Feb 9, 2024 • 57