view article Article Introducing SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding 4 days ago • 36
nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-FP8 Text Generation • 32B • Updated 8 days ago • 1.48M • • 324
Llama Nemotron Collection Open, Production-ready Enterprise Models • 12 items • Updated 2 days ago • 77
nvidia/Llama-3_3-Nemotron-Super-49B-v1 Text Generation • 50B • Updated Oct 15, 2025 • 31.1k • 321
Puzzle: Distillation-Based NAS for Inference-Optimized LLMs Paper • 2411.19146 • Published Nov 28, 2024 • 17