LongTraceRL: Learning Long-Context Reasoning from Search Agent Trajectories with Rubric Rewards Paper • 2605.31584 • Published 5 days ago • 36
IndexCache: Accelerating Sparse Attention via Cross-Layer Index Reuse Paper • 2603.12201 • Published Mar 12 • 53
Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published Jan 9 • 48
OmniAlpha: A Sequence-to-Sequence Framework for Unified Multi-Task RGBA Generation Paper • 2511.20211 • Published Nov 25, 2025 • 12
Glyph: Scaling Context Windows via Visual-Text Compression Paper • 2510.17800 • Published Oct 20, 2025 • 69
view article Article Welcome GPT OSS, the new open-source model family from OpenAI! +10 reach-vb, pcuenq, lewtun, clem, Rocketknight1, clefourrier, celinah, Wauplin, marcsun13, pagezyhf, ahadnagy, joaogante • Aug 5, 2025 • 513
view article Article Mixture of Experts Explained +4 osanseviero, lewtun, philschmid, smangrul, ybelkada, pcuenq • Dec 11, 2023 • 1.14k
view article Article KV Caching Explained: Optimizing Transformer Inference Efficiency not-lain • Jan 30, 2025 • 342
AdaptThink: Reasoning Models Can Learn When to Think Paper • 2505.13417 • Published May 19, 2025 • 83
view article Article Navigating the RLHF Landscape: From Policy Gradients to PPO, GAE, and DPO for LLM Alignment NormalUhr • Feb 11, 2025 • 125