DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning Paper • 2501.12948 • Published Jan 22, 2025 • 451
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published Jan 8 • 232
view article Article Welcome to Inference Providers on the Hub 🔥 +5 burkaygur, zeke, aton2006, hassanelmghari, sbrandeis, kramp, julien-c • Jan 28, 2025 • 495
Lightning Attention-2: A Free Lunch for Handling Unlimited Sequence Lengths in Large Language Models Paper • 2401.04658 • Published Jan 9, 2024 • 27