Chaining the Evidence: Robust Reinforcement Learning for Deep Search Agents with Citation-Aware Rubric Rewards Paper • 2601.06021 • Published 17 days ago • 43
StockBench: Can LLM Agents Trade Stocks Profitably In Real-world Markets? Paper • 2510.02209 • Published Oct 2, 2025 • 54
SIRI: Scaling Iterative Reinforcement Learning with Interleaved Compression Paper • 2509.25176 • Published Sep 29, 2025 • 14
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning Paper • 2506.18841 • Published Jun 23, 2025 • 56
SuperWriter: Reflection-Driven Long-Form Generation with Large Language Models Paper • 2506.04180 • Published Jun 4, 2025 • 33
SageAttention3: Microscaling FP4 Attention for Inference and An Exploration of 8-Bit Training Paper • 2505.11594 • Published May 16, 2025 • 75
AdaptThink: Reasoning Models Can Learn When to Think Paper • 2505.13417 • Published May 19, 2025 • 83
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes Paper • 2504.15270 • Published Apr 21, 2025 • 9 • 3
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes Paper • 2504.15270 • Published Apr 21, 2025 • 9
Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems Paper • 2502.19328 • Published Feb 26, 2025 • 23
NAVIG: Natural Language-guided Analysis with Vision Language Models for Image Geo-localization Paper • 2502.14638 • Published Feb 20, 2025 • 11
OpenSAE-LLaMA-3.1-8B Collection OpenSAE checkpoints for LLaMA 3.1 8B base model • 38 items • Updated Jan 29, 2025 • 5
OpenSAE-LLaMA-3.1-8B Collection OpenSAE checkpoints for LLaMA 3.1 8B base model • 38 items • Updated Jan 29, 2025 • 5