wip readme
#3
by
vincentzed-hf
- opened
README.md
CHANGED
|
@@ -74,24 +74,24 @@ The integration of foundation and fine-tuned models into AI systems requires add
|
|
| 74 |
## Training, Testing, and Evaluation Datasets:
|
| 75 |
|
| 76 |
## Calibration Dataset:
|
| 77 |
-
|
| 78 |
-
|
| 79 |
-
|
| 80 |
|
| 81 |
## Training Datasets:
|
| 82 |
-
|
| 83 |
-
|
| 84 |
-
|
| 85 |
|
| 86 |
## Testing Dataset:
|
| 87 |
-
|
| 88 |
-
|
| 89 |
-
|
| 90 |
|
| 91 |
## Evaluation Dataset:
|
| 92 |
* Datasets: MMLU Pro, GPQA Diamond, LiveCodeBench V6, SciCode, AIME 2025 <br>
|
| 93 |
-
|
| 94 |
-
|
| 95 |
|
| 96 |
|
| 97 |
## Inference:
|
|
@@ -99,7 +99,7 @@ The integration of foundation and fine-tuned models into AI systems requires add
|
|
| 99 |
**Test Hardware:** B300 <br>
|
| 100 |
|
| 101 |
## Post Training Quantization
|
| 102 |
-
This model was obtained by quantizing the weights and activations of Qwen3-Coder-Next to NVFP4 data type, ready for inference with
|
| 103 |
|
| 104 |
## Usage
|
| 105 |
|
|
@@ -110,8 +110,12 @@ To serve the quantized NVFP4 checkpoint with [SGLang](https://github.com/sgl-pro
|
|
| 110 |
```bash
|
| 111 |
sglang serve --model-path vincentzed-hf/Qwen3-Coder-Next-NVFP4 --quantization modelopt_fp4
|
| 112 |
```
|
| 113 |
-
Please
|
| 114 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 115 |
|
| 116 |
### Reproduce with ModelOpt
|
| 117 |
|
|
|
|
| 74 |
## Training, Testing, and Evaluation Datasets:
|
| 75 |
|
| 76 |
## Calibration Dataset:
|
| 77 |
+
* Link: [Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) <br>
|
| 78 |
+
* Data collection method: Automated. <br>
|
| 79 |
+
* Labeling method: Automated. <br>
|
| 80 |
|
| 81 |
## Training Datasets:
|
| 82 |
+
* Data Collection Method by Dataset: Undisclosed <br>
|
| 83 |
+
* Labeling Method by Dataset: Undisclosed<br>
|
| 84 |
+
* Properties: Undisclosed
|
| 85 |
|
| 86 |
## Testing Dataset:
|
| 87 |
+
* Data Collection Method by Dataset: Undisclosed <br>
|
| 88 |
+
* Labeling Method by Dataset: Undisclosed <br>
|
| 89 |
+
* Properties: Undisclosed <br>
|
| 90 |
|
| 91 |
## Evaluation Dataset:
|
| 92 |
* Datasets: MMLU Pro, GPQA Diamond, LiveCodeBench V6, SciCode, AIME 2025 <br>
|
| 93 |
+
* Data collection method: Hybrid: Automated, Human <br>
|
| 94 |
+
* Labeling method: Hybrid: Human, Automated <br>
|
| 95 |
|
| 96 |
|
| 97 |
## Inference:
|
|
|
|
| 99 |
**Test Hardware:** B300 <br>
|
| 100 |
|
| 101 |
## Post Training Quantization
|
| 102 |
+
This model was obtained by quantizing the weights and activations of Qwen3-Coder-Next to NVFP4 data type, ready for inference with SGLang. Only the weights and activations of the linear operators within transformer blocks are quantized, as well as the KV-cache to FP8. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 4x.
|
| 103 |
|
| 104 |
## Usage
|
| 105 |
|
|
|
|
| 110 |
```bash
|
| 111 |
sglang serve --model-path vincentzed-hf/Qwen3-Coder-Next-NVFP4 --quantization modelopt_fp4
|
| 112 |
```
|
| 113 |
+
Please install from source:
|
| 114 |
+
`git clone git@github.com:sgl-project/sglang.git`
|
| 115 |
+
Once the repo is cloned, do `uv pip install -e . "python"` and run the serve command.
|
| 116 |
+
When a release is cut with the bugfix for this model's launch, we will update this model card.
|
| 117 |
+
|
| 118 |
+
|
| 119 |
|
| 120 |
### Reproduce with ModelOpt
|
| 121 |
|