DeepSeek-R1-Distill-Qwen-14B-NVFP4 (Work in Progress)

This repository contains a self-quantized version of DeepSeek-R1-Distill-Qwen-14B using the NVIDIA NVFP4 format. This was produced on an Asus Ascent GX10 (NVIDIA GB10 Grace Blackwell) system using the NVIDIA ModelOptimizer playbook.

Hardware & Architecture

  • Host System: Asus Ascent GX10 (Desktop AI Supercomputer)
  • Accelerator: NVIDIA Blackwell (SM121 / GB10)
  • Memory: 128GB Coherent Unified Memory (LPDDR5X)
  • Format: NVFP4 (4-bit Floating Point) with two-level micro-block scaling.

Current Performance Status (Jan 2026)

Tested on vLLM, but performance on the GX10 is currently inconsistent.

  • Stuttering: There is a known rhythmic stutter in current vLLM builds when running NVFP4 on SM121.
  • Work in Progress: This is a byproduct of early-access software kernels. Native Blackwell acceleration in vLLM is expected to improve in future nv25.x releases.

Deployment Details

Quantized using the standard NVIDIA NVFP4 playbook. This format is designed for hardware-level acceleration on 5th Gen Blackwell Tensor Cores.

License

Original weights by DeepSeek-AI are under the MIT License.

Downloads last month
756
Safetensors
Model size
8B params
Tensor type
BF16
·
F8_E4M3
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for vipertsniper/DeepSeek-R1-Distill-Qwen-14B-NVFP4

Quantized
(135)
this model