vincentzed-hf
/

Qwen3-Coder-Next-NVFP4

@@ -74,24 +74,24 @@ The integration of foundation and fine-tuned models into AI systems requires add
 ## Training, Testing, and Evaluation Datasets:
 ## Calibration Dataset:
-** Link: [cnn_dailymail](https://huggingface.co/datasets/abisee/cnn_dailymail), [Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) <br>
-** Data collection method: Automated. <br>
-** Labeling method: Automated. <br>
 ## Training Datasets:
-** Data Collection Method by Dataset: Undisclosed <br>
-** Labeling Method by Dataset: Undisclosed<br>
-** Properties: Undisclosed
 ## Testing Dataset:
-** Data Collection Method by Dataset: Undisclosed <br>
-** Labeling Method by Dataset: Undisclosed <br>
-** Properties: Undisclosed <br>
 ## Evaluation Dataset:
 * Datasets: MMLU Pro, GPQA Diamond, LiveCodeBench V6, SciCode, AIME 2025 <br>
-** Data collection method: Hybrid: Automated, Human <br>
-** Labeling method: Hybrid: Human, Automated <br>
 ## Inference:
@@ -99,7 +99,7 @@ The integration of foundation and fine-tuned models into AI systems requires add
 **Test Hardware:** B300 <br>
 ## Post Training Quantization
-This model was obtained by quantizing the weights and activations of Qwen3-Coder-Next to NVFP4 data type, ready for inference with TensorRT-LLM. Only the weights and activations of the linear operators within transformer blocks are quantized. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 4x.
 ## Usage
@@ -110,8 +110,12 @@ To serve the quantized NVFP4 checkpoint with [SGLang](https://github.com/sgl-pro
 ```bash
 sglang serve --model-path vincentzed-hf/Qwen3-Coder-Next-NVFP4 --quantization modelopt_fp4
 ```
-Please use this branch and install from source: https://github.com/sgl-project/sglang/pull/18224
-Once the branch is cloned, do `pip install -e .` annd run the serve command.
 ### Reproduce with ModelOpt

 ## Training, Testing, and Evaluation Datasets:
 ## Calibration Dataset:
+* Link: [Nemotron-Post-Training-Dataset-v2](https://huggingface.co/datasets/nvidia/Nemotron-Post-Training-Dataset-v2) <br>
+* Data collection method: Automated. <br>
+* Labeling method: Automated. <br>
 ## Training Datasets:
+* Data Collection Method by Dataset: Undisclosed <br>
+* Labeling Method by Dataset: Undisclosed<br>
+* Properties: Undisclosed
 ## Testing Dataset:
+* Data Collection Method by Dataset: Undisclosed <br>
+* Labeling Method by Dataset: Undisclosed <br>
+* Properties: Undisclosed <br>
 ## Evaluation Dataset:
 * Datasets: MMLU Pro, GPQA Diamond, LiveCodeBench V6, SciCode, AIME 2025 <br>
+* Data collection method: Hybrid: Automated, Human <br>
+* Labeling method: Hybrid: Human, Automated <br>
 ## Inference:
 **Test Hardware:** B300 <br>
 ## Post Training Quantization
+This model was obtained by quantizing the weights and activations of Qwen3-Coder-Next to NVFP4 data type, ready for inference with SGLang. Only the weights and activations of the linear operators within transformer blocks are quantized, as well as the KV-cache to FP8. This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 4x.
 ## Usage
 ```bash
 sglang serve --model-path vincentzed-hf/Qwen3-Coder-Next-NVFP4 --quantization modelopt_fp4
 ```
+Please install from source:
+`git clone git@github.com:sgl-project/sglang.git`
+Once the repo is cloned, do `uv pip install -e . "python"` and run the serve command.
+When a release is cut with the bugfix for this model's launch, we will update this model card.
 ### Reproduce with ModelOpt