ÜberWeb: Insights from Multilingual Curation for a 20-Trillion-Token Dataset Paper • 2602.15210 • Published Feb 25 • 1
Kakugo: Distillation of Low-Resource Languages into Small Language Models Paper • 2601.14051 • Published Jan 20 • 1
Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials Paper • 2404.16829 • Published Apr 25, 2024 • 5
Gamayun's Path to Multilingual Mastery: Cost-Efficient Training of a 1.5B-Parameter LLM Paper • 2512.21580 • Published Dec 25, 2025 • 9
LiteReality: Graphics-Ready 3D Scene Reconstruction from RGB-D Scans Paper • 2507.02861 • Published Jul 3, 2025 • 3
SpaceTimePilot: Generative Rendering of Dynamic Scenes Across Space and Time Paper • 2512.25075 • Published Dec 31, 2025 • 16
Draft-Thinking: Learning Efficient Reasoning in Long Chain-of-Thought LLMs Paper • 2603.00578 • Published Feb 28 • 2
Running Featured 77 Distilling 100B+ Models 40x Faster with TRL 📝 77 TRL distillation for 100B+ teachers, 40x faster
DataDecide: How to Predict Best Pretraining Data with Small Experiments Paper • 2504.11393 • Published Apr 15, 2025 • 20
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining Paper • 2504.16511 • Published Apr 23, 2025 • 23
BARE: Combining Base and Instruction-Tuned Language Models for Better Synthetic Data Generation Paper • 2502.01697 • Published Feb 3, 2025 • 2
LEAD: Iterative Data Selection for Efficient LLM Instruction Tuning Paper • 2505.07437 • Published May 12, 2025 • 2