Update README.md
Browse files
README.md
CHANGED
|
@@ -13,8 +13,7 @@ tags:
|
|
| 13 |
library_name: transformers
|
| 14 |
---
|
| 15 |
|
| 16 |
-
# DeepSeek-R1-Distill-Qwen-14B-NVFP4 (
|
| 17 |
-
|
| 18 |
This repository contains a self-quantized version of **DeepSeek-R1-Distill-Qwen-14B** using the **NVIDIA NVFP4** format. This was produced on an **Asus Ascent GX10** (NVIDIA GB10 Grace Blackwell) system using the NVIDIA ModelOptimizer playbook.
|
| 19 |
|
| 20 |
## Hardware & Architecture
|
|
@@ -23,11 +22,6 @@ This repository contains a self-quantized version of **DeepSeek-R1-Distill-Qwen-
|
|
| 23 |
- **Memory:** 128GB Coherent Unified Memory (LPDDR5X)
|
| 24 |
- **Format:** NVFP4 (4-bit Floating Point) with two-level micro-block scaling.
|
| 25 |
|
| 26 |
-
## The 14B "Reasoning" Advantage
|
| 27 |
-
The 14B Qwen-distill is widely considered the sweet spot for Blackwell users.
|
| 28 |
-
- **Efficiency:** In NVFP4, this model consumes approximately **9GB-10GB** of VRAM, leaving massive room for KV cache (context memory) on the GX10.
|
| 29 |
-
- **IQ vs Size:** Distilled from the massive 671B R1, the 14B version significantly outperforms the 8B Llama variant in complex coding and mathematical reasoning benchmarks.
|
| 30 |
-
|
| 31 |
## Current Performance Status (Jan 2026)
|
| 32 |
Tested on vLLM, but performance on the GX10 is currently inconsistent.
|
| 33 |
- **Stuttering:** There is a known rhythmic stutter in current vLLM builds when running NVFP4 on SM121.
|
|
|
|
| 13 |
library_name: transformers
|
| 14 |
---
|
| 15 |
|
| 16 |
+
# DeepSeek-R1-Distill-Qwen-14B-NVFP4 (Work in Progress)
|
|
|
|
| 17 |
This repository contains a self-quantized version of **DeepSeek-R1-Distill-Qwen-14B** using the **NVIDIA NVFP4** format. This was produced on an **Asus Ascent GX10** (NVIDIA GB10 Grace Blackwell) system using the NVIDIA ModelOptimizer playbook.
|
| 18 |
|
| 19 |
## Hardware & Architecture
|
|
|
|
| 22 |
- **Memory:** 128GB Coherent Unified Memory (LPDDR5X)
|
| 23 |
- **Format:** NVFP4 (4-bit Floating Point) with two-level micro-block scaling.
|
| 24 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
## Current Performance Status (Jan 2026)
|
| 26 |
Tested on vLLM, but performance on the GX10 is currently inconsistent.
|
| 27 |
- **Stuttering:** There is a known rhythmic stutter in current vLLM builds when running NVFP4 on SM121.
|