💻 Qwopus-Coder Collection Reasoning-distilled coding models optimized for specialized domains like agentic workflows. • 10 items • Updated 2 days ago • 34
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 113
To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models Paper • 2602.12566 • Published Feb 13 • 1
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7, 2025 • 112
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano, Super, and Ultra v3. • 50 items • Updated 21 days ago • 168
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published Nov 7, 2024 • 23
Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction Paper • 2509.18658 • Published Sep 23, 2025 • 1
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published Oct 6, 2025 • 51