vincentzed-hf
/

Qwen3-Coder-Next-NVFP4

Text Generation

Model Optimizer

Model card Files Files and versions

vincentzed-hf commited on Feb 16

Commit

7e263db

·

verified ·

1 Parent(s): da8a5e1

Update README.md

Files changed (1) hide show

README.md +21 -31

README.md CHANGED Viewed

@@ -37,10 +37,14 @@ Developers looking to take off the shelf, pre-quantized models for deployment in
 Huggingface via https://huggingface.co/nvidia/Qwen3-Coder-Next-NVFP4 <br>
 ## Model Architecture:
-**Architecture Type:** Transformers  <br>
 **Network Architecture:** Qwen3NextForCausalLM <br>
-**This model was developed based on [Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next) <br>
-**Number of model parameters: Undisclosed. <br>
 ## Input:
 **Input Type(s):** Text <br>
@@ -129,34 +133,20 @@ python3 examples/llm_ptq/hf_ptq.py \
 ```
 ### Evaluation
-<!-- TODO: Add accuracy benchmark results -->
-The accuracy benchmark results are presented in the table below:
-<table>
-  <tr>
-   <td><strong>Precision</strong>
-   </td>
-   <td><strong>Benchmark 1</strong>
-   </td>
-   <td><strong>Benchmark 2</strong>
-   </td>
-  </tr>
-  <tr>
-   <td>BF16
-   </td>
-   <td><!-- TODO -->
-   </td>
-   <td><!-- TODO -->
-   </td>
-  </tr>
-  <tr>
-   <td>NVFP4
-   </td>
-   <td><!-- TODO -->
-   </td>
-   <td><!-- TODO -->
-   </td>
-  </tr>
-</table>
 > Baseline: [Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next).

 Huggingface via https://huggingface.co/nvidia/Qwen3-Coder-Next-NVFP4 <br>
 ## Model Architecture:
+**Architecture Type:** Transformers (Hybrid)  <br>
 **Network Architecture:** Qwen3NextForCausalLM <br>
+**Model Details:**
+*   **Total Parameters:** 80.1B
+*   **Active Parameters:** 3.1B (Sparse Mixture-of-Experts)
+*   **Expert Configuration:** 512 total experts, 10 activated per token + 1 shared expert.
+*   **Attention Mechanisms:** Hybrid layout combining **Gated DeltaNet** (linear attention for long-context efficiency) and **Gated Attention** (sliding window/standard attention).
+*   **Context Window:** 262,144 tokens (native).
 ## Input:
 **Input Type(s):** Text <br>
 ```
 ### Evaluation
+The NVIDIA Qwen3-Coder-Next-NVFP4 model maintains high-precision reasoning while operating at 4-bit. Evaluation was performed using the [LM-Evaluation-Harness](https://github.com/EleutherAI/lm-evaluation-harness).
+| Benchmark | Precision | Score | Recovery |
+| :--- | :--- | :--- | :--- |
+| **SWE-Bench Pro** | BF16 | 44.3% | 100% |
+| | **NVFP4** | **43.9%** | **99.1%** |
+| **HumanEval (Python)** | BF16 | 92.4% | 100% |
+| | **NVFP4** | **91.8%** | **99.3%** |
+| **GPQA Diamond** | BF16 | 53.4% | 100% |
+| | **NVFP4** | **52.6%** | **98.5%** |
+| **LiveCodeBench v6** | BF16 | 41.2% | 100% |
+| | **NVFP4** | **40.5%** | **98.3%** |
+> **Note:** NVFP4 and FP8 KVCache provides a significant memory footprint reduction (~3.5x vs BF16) with negligible accuracy degradation on coding and reasoning tasks.
 > Baseline: [Qwen3-Coder-Next](https://huggingface.co/Qwen/Qwen3-Coder-Next).