view article Article Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models nvidia • May 23 • 34
view article Article Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding nvidia • Mar 19 • 47
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5-FP8 Text Generation • 50B • Updated Oct 15, 2025 • 239k • 28
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5 Text Generation • 50B • Updated Oct 15, 2025 • 653k • • 234
nvidia/Llama-3_1-Nemotron-Ultra-253B-v1 Text Generation • 253B • Updated Oct 15, 2025 • 1.83k • • 352
FFN Fusion: Rethinking Sequential Computation in Large Language Models Paper • 2503.18908 • Published Mar 24, 2025 • 20
nvidia/Llama-3.1-Nemotron-Nano-8B-v1 Text Generation • 8B • Updated Oct 15, 2025 • 20.1k • • 221
nvidia/Llama-3_3-Nemotron-Super-49B-v1 Text Generation • 50B • Updated Oct 15, 2025 • 176k • 322
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 20
nvidia/Llama-3_1-Nemotron-51B-Instruct Text Generation • 52B • Updated Jul 6, 2025 • 767 • 210