Beyond Length Scaling: Synergizing Breadth and Depth for Generative Reward Models
Paper • 2603.01571 • Published • 33
Fundamental Al Methods; Perception & World Modeling; Reasoning & Generation; Action & Interaction
SIN-Bench: Tracing Native Evidence Chains in Long-Context Multimodal Scientific Interleaved Literature
ProFit: Leveraging High-Value Signals in SFT via Probability-Guided Token Selection