To Mix or To Merge: Toward Multi-Domain Reinforcement Learning for Large Language Models Paper • 2602.12566 • Published Feb 13 • 1
In-the-Flow Agentic System Optimization for Effective Planning and Tool Use Paper • 2510.05592 • Published Oct 7, 2025 • 111
Nemotron-Post-Training-v3 Collection Collection of datasets used in the post-training phase of Nemotron Nano and Super v3. • 28 items • Updated 3 days ago • 138
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6, 2025 • 242
VideoGLaMM: A Large Multimodal Model for Pixel-Level Visual Grounding in Videos Paper • 2411.04923 • Published Nov 7, 2024 • 23
Analyzing Uncertainty of LLM-as-a-Judge: Interval Evaluations with Conformal Prediction Paper • 2509.18658 • Published Sep 23, 2025 • 1
Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models Paper • 2510.05034 • Published Oct 6, 2025 • 51