DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 455
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text Paper • 2506.05209 • Published Jun 5, 2025 • 63
view article Article Beyond LoRA: Can you beat the most popular fine-tuning technique? +2 BenjaminB, sayakpaul, hubnemo, kashif • 17 days ago • 71
view article Article Unlocking Agentic RL Training for GPT-OSS: A Practical Retrospective LinkedIn • Jan 27 • 80