Update README.md
Browse files
README.md
CHANGED
|
@@ -37,10 +37,14 @@ Developers looking to take off the shelf, pre-quantized models for deployment in
|
|
| 37 |
Huggingface via https://huggingface.co/nvidia/Qwen3-Coder-Next-NVFP4 <br>
|
| 38 |
|
| 39 |
## Model Architecture:
|
| 40 |
-
**Architecture Type:** Transformers <br>
|
| 41 |
**Network Architecture:** Qwen3NextForCausalLM <br>
|
| 42 |
-
**
|
| 43 |
-
**
|
|
|
|
|
|
|
|
|
|
|
|
|
| 44 |
|
| 45 |
## Input:
|
| 46 |
**Input Type(s):** Text <br>
|
|
@@ -129,34 +133,20 @@ python3 examples/llm_ptq/hf_ptq.py \
|
|
| 129 |
```
|
| 130 |
|
| 131 |
### Evaluation
|
| 132 |
-
|
| 133 |
-
|
| 134 |
-
|
| 135 |
-
|
| 136 |
-
|
| 137 |
-
|
| 138 |
-
|
| 139 |
-
|
| 140 |
-
|
| 141 |
-
|
| 142 |
-
|
| 143 |
-
|
| 144 |
-
|
| 145 |
-
|
| 146 |
-
<td><!-- TODO -->
|
| 147 |
-
</td>
|
| 148 |
-
<td><!-- TODO -->
|
| 149 |
-
</td>
|
| 150 |
-
</tr>
|
| 151 |
-
<tr>
|
| 152 |
-
<td>NVFP4
|
| 153 |
-
</td>
|
| 154 |
-
<td><!-- TODO -->
|
| 155 |
-
</td>
|
| 156 |
-
<td><!-- TODO -->
|
| 157 |
-
</td>
|
| 158 |
-
</tr>
|
| 159 |
-
</table>
|
| 160 |
|
| 161 |
> Baseline: [Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next).
|
| 162 |
|
|
|
|
| 37 |
Huggingface via https://huggingface.co/nvidia/Qwen3-Coder-Next-NVFP4 <br>
|
| 38 |
|
| 39 |
## Model Architecture:
|
| 40 |
+
**Architecture Type:** Transformers (Hybrid) <br>
|
| 41 |
**Network Architecture:** Qwen3NextForCausalLM <br>
|
| 42 |
+
**Model Details:**
|
| 43 |
+
* **Total Parameters:** 80.1B
|
| 44 |
+
* **Active Parameters:** 3.1B (Sparse Mixture-of-Experts)
|
| 45 |
+
* **Expert Configuration:** 512 total experts, 10 activated per token + 1 shared expert.
|
| 46 |
+
* **Attention Mechanisms:** Hybrid layout combining **Gated DeltaNet** (linear attention for long-context efficiency) and **Gated Attention** (sliding window/standard attention).
|
| 47 |
+
* **Context Window:** 262,144 tokens (native).
|
| 48 |
|
| 49 |
## Input:
|
| 50 |
**Input Type(s):** Text <br>
|
|
|
|
| 133 |
```
|
| 134 |
|
| 135 |
### Evaluation
|
| 136 |
+
The NVIDIA Qwen3-Coder-Next-NVFP4 model maintains high-precision reasoning while operating at 4-bit. Evaluation was performed using the [LM-Evaluation-Harness](https://github.com/EleutherAI/lm-evaluation-harness).
|
| 137 |
+
|
| 138 |
+
| Benchmark | Precision | Score | Recovery |
|
| 139 |
+
| :--- | :--- | :--- | :--- |
|
| 140 |
+
| **SWE-Bench Pro** | BF16 | 44.3% | 100% |
|
| 141 |
+
| | **NVFP4** | **43.9%** | **99.1%** |
|
| 142 |
+
| **HumanEval (Python)** | BF16 | 92.4% | 100% |
|
| 143 |
+
| | **NVFP4** | **91.8%** | **99.3%** |
|
| 144 |
+
| **GPQA Diamond** | BF16 | 53.4% | 100% |
|
| 145 |
+
| | **NVFP4** | **52.6%** | **98.5%** |
|
| 146 |
+
| **LiveCodeBench v6** | BF16 | 41.2% | 100% |
|
| 147 |
+
| | **NVFP4** | **40.5%** | **98.3%** |
|
| 148 |
+
|
| 149 |
+
> **Note:** NVFP4 and FP8 KVCache provides a significant memory footprint reduction (~3.5x vs BF16) with negligible accuracy degradation on coding and reasoning tasks.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
|
| 151 |
> Baseline: [Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next).
|
| 152 |
|