DeepSeek-R1-Distill-Qwen-32B-NVFP4 (Work in Progress)
This is a self-quantized version of DeepSeek-R1-Distill-Qwen-32B using the NVIDIA NVFP4 format.
Tech Specs & Hardware
- System: Produced and tested on an Asus Ascent GX10 (NVIDIA Blackwell SM121).
- Format: NVFP4 (4-bit Floating Point) with two-level micro-block scaling.
- VRAM Footprint: Weights occupy approximately 20GB.
Status: Work in Progress (WIP)
- Current Performance: Functional but currently experiencing "Blackwell stuttering" on vLLM
nv25.12. - Note: This is an experimental release. Throughput issues are likely due to early-stage kernel support for SM121 silicon.
- Goal: This repository serves as a baseline for Blackwell performance testing. Performance is expected to stabilize as vLLM/SGLang native Blackwell support matures.
License
Original weights by DeepSeek-AI are under the MIT License.
- Downloads last month
- 801
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for vipertsniper/DeepSeek-R1-Distill-Qwen-32B-NVFP4
Base model
deepseek-ai/DeepSeek-R1-Distill-Qwen-32B