vipertsniper
/

DeepSeek-R1-Distill-Qwen-32B-NVFP4

asus-ascent-gx10

8-bit precision

Model card Files Files and versions

DeepSeek-R1-Distill-Qwen-32B-NVFP4 (Work in Progress)

This is a self-quantized version of DeepSeek-R1-Distill-Qwen-32B using the NVIDIA NVFP4 format.

Tech Specs & Hardware

System: Produced and tested on an Asus Ascent GX10 (NVIDIA Blackwell SM121).
Format: NVFP4 (4-bit Floating Point) with two-level micro-block scaling.
VRAM Footprint: Weights occupy approximately 20GB.

Status: Work in Progress (WIP)

Current Performance: Functional but currently experiencing "Blackwell stuttering" on vLLM nv25.12.
Note: This is an experimental release. Throughput issues are likely due to early-stage kernel support for SM121 silicon.
Goal: This repository serves as a baseline for Blackwell performance testing. Performance is expected to stabilize as vLLM/SGLang native Blackwell support matures.

License

Original weights by DeepSeek-AI are under the MIT License.

Downloads last month: 8

Safetensors

Model size

17B params

Tensor type

BF16

·

F8_E4M3

·

U8

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vipertsniper/DeepSeek-R1-Distill-Qwen-32B-NVFP4

Base model

deepseek-ai/DeepSeek-R1-Distill-Qwen-32B

Quantized

(138)

this model