nvidia/GLM-5.2-NVFP4 · Add vLLM commands, remove calibration datasets

Add vLLM commands, remove calibration datasets

by frida-a - opened 6 days ago

base: refs/heads/main

←

from: refs/pr/3

Discussion Files changed

+16

-6

Files changed (1) hide show

README.md +16 -6

README.md CHANGED Viewed

@@ -75,12 +75,6 @@ The model version is NVFP4 1.0 version and is quantized with nvidia-modelopt **v
 We calibrated the model using the dataset noted below, and performed evaluation using the benchmarks noted under Evaluation Datasets.
 We did not perform training or testing for this Model Optimizer release. The methods noted under Training and Testing Datasets below represent the data collection and labeling methods used by the third-party to train and test the underlying model.<br>
-## Calibration Dataset:
-**Link:** [Nemotron-SFT-Instruction-Following-Chat-v2](https://huggingface.co/datasets/nvidia/Nemotron-SFT-Instruction-Following-Chat-v2), [Nemotron-Science-v1](https://huggingface.co/datasets/nvidia/Nemotron-Science-v1), [Nemotron-Competitive-Programming-v1](https://huggingface.co/datasets/nvidia/Nemotron-Competitive-Programming-v1), [Nemotron-SFT-Agentic-v2](https://huggingface.co/datasets/nvidia/Nemotron-SFT-Agentic-v2), [Nemotron-Math-v2](https://huggingface.co/datasets/nvidia/Nemotron-Math-v2), [Nemotron-SFT-SWE-v2](https://huggingface.co/datasets/nvidia/Nemotron-SFT-SWE-v2), [Nemotron-SFT-Multilingual-v1](https://huggingface.co/datasets/nvidia/Nemotron-SFT-Multilingual-v1) <br>
-**Data Collection Method by dataset:** Hybrid: Human, Synthetic, Automated. <br>
-**Labeling method:** Hybrid: Human, Automated. <br>
-**Properties:** Nemotron-SFT-Instruction-Following-Chat-v2 contains ~2M synthetic chat samples designed to strengthen open-ended chat and precise instruction following capabilities. Nemotron-Science-v1 is a synthetic science reasoning dataset with ~226K samples covering GPQA-style science questions and chemistry problems to enhance LLM reasoning in scientific domains. Nemotron-Competitive-Programming-v1 is a large-scale synthetic coding dataset with 2M+ Python and 1M+ C++ samples spanning 34K+ competitive programming questions for code completion and critique. Nemotron-SFT-Agentic-v2 contains ~992K samples of tool-calling trajectories, customer service conversations, and web-search trajectories to train interactive, tool-using agents. Nemotron-Math-v2 is a large-scale mathematical reasoning dataset with ~347K problems and 7M model-generated reasoning trajectories across multiple reasoning modes and tool-use configurations. Nemotron-SFT-SWE-v2 contains ~256K software engineering samples including agentic SWE trajectories and agentless code localization, repair, and test generation samples for SWE-Bench style tasks. Nemotron-SFT-Multilingual-v1 contains ~3M multilingual reasoning samples translated from math, code, and STEM data into German, French, Japanese, Italian, Chinese, and Spanish. <br>
 ## Training Dataset:
 **Data Modality:** Undisclosed <br>
 **Data Collection Method by dataset:** Undisclosed <br>
@@ -125,6 +119,22 @@ python3 -m sglang.launch_server \
     --mem-fraction-static 0.80
 ```
 ## Evaluation
 The accuracy benchmark results are presented in the table below. AA-LCR was measured with SGLang; all other benchmarks were measured with vLLM.
 <table>

 We calibrated the model using the dataset noted below, and performed evaluation using the benchmarks noted under Evaluation Datasets.
 We did not perform training or testing for this Model Optimizer release. The methods noted under Training and Testing Datasets below represent the data collection and labeling methods used by the third-party to train and test the underlying model.<br>
 ## Training Dataset:
 **Data Modality:** Undisclosed <br>
 **Data Collection Method by dataset:** Undisclosed <br>
     --mem-fraction-static 0.80
 ```
+### vLLM
+To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), use the `vllm/vllm-openai:v0.23.0` image and run:
+```sh
+vllm serve nvidia/GLM-5.2-NVFP4 \
+    --tensor-parallel-size 8 \
+    --enable-expert-parallel \
+    --trust-remote-code \
+    --reasoning-parser glm45 \
+    --tool-call-parser glm47 \
+    --enable-auto-tool-choice \
+    --kv-cache-dtype fp8_e4m3 \
+    --host 0.0.0.0 --port 8000
+```
 ## Evaluation
 The accuracy benchmark results are presented in the table below. AA-LCR was measured with SGLang; all other benchmarks were measured with vLLM.
 <table>