vipertsniper commited on
Commit
344e229
·
verified ·
1 Parent(s): c9f85dd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -7
README.md CHANGED
@@ -13,8 +13,7 @@ tags:
13
  library_name: transformers
14
  ---
15
 
16
- # DeepSeek-R1-Distill-Qwen-14B-NVFP4 (Blackwell Optimized)
17
-
18
  This repository contains a self-quantized version of **DeepSeek-R1-Distill-Qwen-14B** using the **NVIDIA NVFP4** format. This was produced on an **Asus Ascent GX10** (NVIDIA GB10 Grace Blackwell) system using the NVIDIA ModelOptimizer playbook.
19
 
20
  ## Hardware & Architecture
@@ -23,11 +22,6 @@ This repository contains a self-quantized version of **DeepSeek-R1-Distill-Qwen-
23
  - **Memory:** 128GB Coherent Unified Memory (LPDDR5X)
24
  - **Format:** NVFP4 (4-bit Floating Point) with two-level micro-block scaling.
25
 
26
- ## The 14B "Reasoning" Advantage
27
- The 14B Qwen-distill is widely considered the sweet spot for Blackwell users.
28
- - **Efficiency:** In NVFP4, this model consumes approximately **9GB-10GB** of VRAM, leaving massive room for KV cache (context memory) on the GX10.
29
- - **IQ vs Size:** Distilled from the massive 671B R1, the 14B version significantly outperforms the 8B Llama variant in complex coding and mathematical reasoning benchmarks.
30
-
31
  ## Current Performance Status (Jan 2026)
32
  Tested on vLLM, but performance on the GX10 is currently inconsistent.
33
  - **Stuttering:** There is a known rhythmic stutter in current vLLM builds when running NVFP4 on SM121.
 
13
  library_name: transformers
14
  ---
15
 
16
+ # DeepSeek-R1-Distill-Qwen-14B-NVFP4 (Work in Progress)
 
17
  This repository contains a self-quantized version of **DeepSeek-R1-Distill-Qwen-14B** using the **NVIDIA NVFP4** format. This was produced on an **Asus Ascent GX10** (NVIDIA GB10 Grace Blackwell) system using the NVIDIA ModelOptimizer playbook.
18
 
19
  ## Hardware & Architecture
 
22
  - **Memory:** 128GB Coherent Unified Memory (LPDDR5X)
23
  - **Format:** NVFP4 (4-bit Floating Point) with two-level micro-block scaling.
24
 
 
 
 
 
 
25
  ## Current Performance Status (Jan 2026)
26
  Tested on vLLM, but performance on the GX10 is currently inconsistent.
27
  - **Stuttering:** There is a known rhythmic stutter in current vLLM builds when running NVFP4 on SM121.