unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF
Text Generation • 121B • Updated • 69.1k • 93
I run it on threadripper 3970x with 256gb system ram and offloading computation layers to a gtx 1660 6gb vram. Using llama.cpp with -nkvo -kvu and all MoE on CPU. With an amazing speed on 14/TpS generation speed using q8_0. I’m amazed