To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models Paper • 2602.12566 • Published Feb 13 • 1 • 1
To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models Paper • 2602.12566 • Published Feb 13 • 1
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7, 2025 • 111
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 28 items • Updated 18 days ago • 136
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 25 days ago • 90 • 6
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 25 days ago • 13 • 7
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published 25 days ago • 90 • 6
Lightning OPD: Efficient Post-Training for Large Reasoning Models with Offline On-Policy Distillation Paper • 2604.13010 • Published 25 days ago • 13 • 7
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published Nov 7, 2024 • 23
Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction Paper • 2509.18658 • Published Sep 23, 2025 • 1
Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction Paper • 2509.18658 • Published Sep 23, 2025 • 1
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published Oct 6, 2025 • 51