Uniform Diffusion Models Revisited: Leave-One-Out Denoiser and Absorbing State Reformulation Paper • 2605.22765 • Published May 21 • 5
view article Article Beyond LoRA: Can you beat the most popular fine-tuning technique? +2 BenjaminB, sayakpaul, hubnemo, kashif • 8 days ago • 61
view article Article Introducing North Mini Code: Cohere’s First Model For Developers CohereLabs • 16 days ago • 75
Accelerating RL for LLM Reasoning with Optimal Advantage Regression Paper • 2505.20686 • Published May 27, 2025 • 3
GlucoFM: A Dual-Stream Foundation Model for Continuous Glucose Monitoring Paper • 2605.30865 • Published 28 days ago • 7
view article Article Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler +3 ariG23498, sayakpaul, sergiopaniego, ror, pcuenq • 28 days ago • 127
X-Token: Projection-Guided Cross-Tokenizer Knowledge Distillation Paper • 2605.21699 • Published May 20 • 1
view article Article Shipping a Trillion Parameters With a Hub Bucket: Delta Weight Sync in TRL +6 aminediroHF, qgallouedec, kashif, lewtun, edbeeching, albertvillanova, lvwerra, sergiopaniego • 30 days ago • 42
🧬 Carbon Collection Carbon 500M, 3B, 8B genomic models and GGUF variants for llama.cpp • 7 items • Updated 23 days ago • 43
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI Paper • 2605.06651 • Published May 7 • 16
Triple Preference Optimization: Achieving Better Alignment with Less Data in a Single Step Optimization Paper • 2405.16681 • Published May 26, 2024 • 4
view article Article How I contributed a new model to the Transformers library using Codex nielsr • Mar 30 • 52