vincentzed-hf commited on
Commit
7e263db
·
verified ·
1 Parent(s): da8a5e1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -31
README.md CHANGED
@@ -37,10 +37,14 @@ Developers looking to take off the shelf, pre-quantized models for deployment in
37
  Huggingface via https://huggingface.co/nvidia/Qwen3-Coder-Next-NVFP4 <br>
38
 
39
  ## Model Architecture:
40
- **Architecture Type:** Transformers <br>
41
  **Network Architecture:** Qwen3NextForCausalLM <br>
42
- **This model was developed based on [Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next) <br>
43
- **Number of model parameters: Undisclosed. <br>
 
 
 
 
44
 
45
  ## Input:
46
  **Input Type(s):** Text <br>
@@ -129,34 +133,20 @@ python3 examples/llm_ptq/hf_ptq.py \
129
  ```
130
 
131
  ### Evaluation
132
- <!-- TODO: Add accuracy benchmark results -->
133
- The accuracy benchmark results are presented in the table below:
134
- <table>
135
- <tr>
136
- <td><strong>Precision</strong>
137
- </td>
138
- <td><strong>Benchmark 1</strong>
139
- </td>
140
- <td><strong>Benchmark 2</strong>
141
- </td>
142
- </tr>
143
- <tr>
144
- <td>BF16
145
- </td>
146
- <td><!-- TODO -->
147
- </td>
148
- <td><!-- TODO -->
149
- </td>
150
- </tr>
151
- <tr>
152
- <td>NVFP4
153
- </td>
154
- <td><!-- TODO -->
155
- </td>
156
- <td><!-- TODO -->
157
- </td>
158
- </tr>
159
- </table>
160
 
161
  > Baseline: [Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next).
162
 
 
37
  Huggingface via https://huggingface.co/nvidia/Qwen3-Coder-Next-NVFP4 <br>
38
 
39
  ## Model Architecture:
40
+ **Architecture Type:** Transformers (Hybrid) <br>
41
  **Network Architecture:** Qwen3NextForCausalLM <br>
42
+ **Model Details:**
43
+ * **Total Parameters:** 80.1B
44
+ * **Active Parameters:** 3.1B (Sparse Mixture-of-Experts)
45
+ * **Expert Configuration:** 512 total experts, 10 activated per token + 1 shared expert.
46
+ * **Attention Mechanisms:** Hybrid layout combining **Gated DeltaNet** (linear attention for long-context efficiency) and **Gated Attention** (sliding window/standard attention).
47
+ * **Context Window:** 262,144 tokens (native).
48
 
49
  ## Input:
50
  **Input Type(s):** Text <br>
 
133
  ```
134
 
135
  ### Evaluation
136
+ The NVIDIA Qwen3-Coder-Next-NVFP4 model maintains high-precision reasoning while operating at 4-bit. Evaluation was performed using the [LM-Evaluation-Harness](https://github.com/EleutherAI/lm-evaluation-harness).
137
+
138
+ | Benchmark | Precision | Score | Recovery |
139
+ | :--- | :--- | :--- | :--- |
140
+ | **SWE-Bench Pro** | BF16 | 44.3% | 100% |
141
+ | | **NVFP4** | **43.9%** | **99.1%** |
142
+ | **HumanEval (Python)** | BF16 | 92.4% | 100% |
143
+ | | **NVFP4** | **91.8%** | **99.3%** |
144
+ | **GPQA Diamond** | BF16 | 53.4% | 100% |
145
+ | | **NVFP4** | **52.6%** | **98.5%** |
146
+ | **LiveCodeBench v6** | BF16 | 41.2% | 100% |
147
+ | | **NVFP4** | **40.5%** | **98.3%** |
148
+
149
+ > **Note:** NVFP4 and FP8 KVCache provides a significant memory footprint reduction (~3.5x vs BF16) with negligible accuracy degradation on coding and reasoning tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
150
 
151
  > Baseline: [Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next).
152