DeepSeek-R1-Distill-Qwen-32B-NVFP4 (Work in Progress)

This is a self-quantized version of DeepSeek-R1-Distill-Qwen-32B using the NVIDIA NVFP4 format.

Tech Specs & Hardware

  • System: Produced and tested on an Asus Ascent GX10 (NVIDIA Blackwell SM121).
  • Format: NVFP4 (4-bit Floating Point) with two-level micro-block scaling.
  • VRAM Footprint: Weights occupy approximately 20GB.

Status: Work in Progress (WIP)

  • Current Performance: Functional but currently experiencing "Blackwell stuttering" on vLLM nv25.12.
  • Note: This is an experimental release. Throughput issues are likely due to early-stage kernel support for SM121 silicon.
  • Goal: This repository serves as a baseline for Blackwell performance testing. Performance is expected to stabilize as vLLM/SGLang native Blackwell support matures.

License

Original weights by DeepSeek-AI are under the MIT License.

Downloads last month
801
Safetensors
Model size
17B params
Tensor type
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vipertsniper/DeepSeek-R1-Distill-Qwen-32B-NVFP4

Quantized
(145)
this model