meenchen commited on
Commit
7cd997a
·
verified ·
1 Parent(s): 93a46fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -3
README.md CHANGED
@@ -16,7 +16,7 @@ tags:
16
  # Model Overview
17
 
18
  ## Description:
19
- The NVIDIA Qwen3-Coder-480B-A35B-Instruct NVFP4 model is the quantized version of Alibaba's Qwen3-Coder-480B-A35B-Instruct model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check [here](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct). The NVIDIA Qwen3-Coder-480B-A35B-Instruct FP4 model is quantized with [TensorRT Model Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer).
20
 
21
  This model is ready for commercial/non-commercial use. <br>
22
 
@@ -98,7 +98,7 @@ The model is quantized with nvidia-modelopt **v0.41.0** <br>
98
  **Test Hardware:** B200 <br>
99
 
100
  ## Post Training Quantization
101
- This model was obtained by quantizing the weights and activations of Qwen3-Coder-480B-A35B-Instruct to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 3.3x.
102
 
103
  ## Usage
104
 
@@ -150,7 +150,7 @@ The accuracy benchmark results are presented in the table below:
150
  </td>
151
  </tr>
152
  <tr>
153
- <td>BF16 (AA Ref)
154
  </td>
155
  <td>0.486
156
  </td>
@@ -168,6 +168,8 @@ The accuracy benchmark results are presented in the table below:
168
  <tr>
169
  </table>
170
 
 
 
171
 
172
 
173
  ## Ethical Considerations
 
16
  # Model Overview
17
 
18
  ## Description:
19
+ The NVIDIA Qwen3-Coder-480B-A35B-Instruct NVFP4 model is the quantized version of Alibaba's Qwen3-Coder-480B-A35B-Instruct model, which is an auto-regressive language model that uses an optimized transformer architecture. For more information, please check [here](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct). The NVIDIA Qwen3-Coder-480B-A35B-Instruct FP4 model is quantized with [Model Optimizer](https://github.com/NVIDIA/Model-Optimizer).
20
 
21
  This model is ready for commercial/non-commercial use. <br>
22
 
 
98
  **Test Hardware:** B200 <br>
99
 
100
  ## Post Training Quantization
101
+ This model was obtained by quantizing the weights and activations of Qwen3-Coder-480B-A35B-Instruct to FP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 3.5x.
102
 
103
  ## Usage
104
 
 
150
  </td>
151
  </tr>
152
  <tr>
153
+ <td>BF16
154
  </td>
155
  <td>0.486
156
  </td>
 
168
  <tr>
169
  </table>
170
 
171
+ > Baseline: [Qwen/Qwen3-Coder-480B-A35B-Instruct](https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct).
172
+ > Benchmarked with temperature=0.0, top_p=1.0e-05, max num tokens 16384
173
 
174
 
175
  ## Ethical Considerations