nvidia/nemotron-speech-streaming-en-0.6b Automatic Speech Recognition • Updated 13 days ago • 8.36k • 538
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published Nov 27, 2025 • 245
Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence Paper • 2505.23747 • Published May 29, 2025 • 69
view article Article Vision Language Models (Better, faster, stronger) +3 merve, sergiopaniego, ariG23498, pcuenq, andito • May 12, 2025 • 611
view article Article The N Implementation Details of RLHF with PPO +1 vwxyzjn, tianlinliu0121, lvwerra • Oct 24, 2023 • 72
O1 Replication Journey -- Part 3: Inference-time Scaling for Medical Reasoning Paper • 2501.06458 • Published Jan 11, 2025 • 31
2.5 Years in Class: A Multimodal Textbook for Vision-Language Pretraining Paper • 2501.00958 • Published Jan 1, 2025 • 110