view article Article Keep the Tokens Flowing: Lessons from 16 Open-Source RL Libraries +7 4 days ago • 41
view article Article Ulysses Sequence Parallelism: Training with Million-Token Contexts 5 days ago • 15
view article Article Compute and Competition in AI: Different FlOPs for Different Folks 29 days ago • 12
view article Article 🪄 Interpreto: A Unified Toolkit for Interpretability of Transformer Models Jan 20 • 37
Scaling Laws for Code: Every Programming Language Matters Paper • 2512.13472 • Published Dec 15, 2025 • 15
view article Article Saving Memory Using Padding-Free Transformer Layers during Finetuning Jun 11, 2024 • 21
view article Article Nemotron 3 Nano \- A new Standard for Efficient, Open, and Intelligent Agentic Models Dec 15, 2025 • 110
view article Article Tensor Parallelism (TP) in Transformers: 5 Minutes to Understand Dec 4, 2025 • 66
Fantastic Pretraining Optimizers and Where to Find Them Paper • 2509.02046 • Published Sep 2, 2025 • 14