GadflyII
/

GLM-4.6V-NVFP4

Image-Text-to-Text

vision-language-model

Mixture of Experts

8-bit precision

compressed-tensors

Model card Files Files and versions

GadflyII commited on 9 days ago

Commit

6a18ef6

·

verified ·

1 Parent(s): 9dc7907

Update README.md

Files changed (1) hide show

README.md +2 -10

README.md CHANGED Viewed

@@ -45,7 +45,7 @@ NVFP4 (4-bit floating point) quantized version of [zai-org/GLM-4.6V](https://hug
 ### Launch Command
 ```bash
-# Single GPU (96GB VRAM - full 128K context)
 python -m vllm.entrypoints.openai.api_server \
   --model GadflyII/GLM-4.6V-NVFP4 \
   --tensor-parallel-size 1 \
@@ -53,15 +53,7 @@ python -m vllm.entrypoints.openai.api_server \
   --max-model-len 131072 \
   --port 8000
-# Single GPU (80GB VRAM - reduced context)
-python -m vllm.entrypoints.openai.api_server \
-  --model GadflyII/GLM-4.6V-NVFP4 \
-  --tensor-parallel-size 1 \
-  --trust-remote-code \
-  --max-model-len 131072 \
-  --port 8000
-# Two GPUs (for 48GB cards)
 python -m vllm.entrypoints.openai.api_server \
   --model GadflyII/GLM-4.6V-NVFP4 \
   --tensor-parallel-size 2 \

 ### Launch Command
 ```bash
+# Single GPU (full 128K context)
 python -m vllm.entrypoints.openai.api_server \
   --model GadflyII/GLM-4.6V-NVFP4 \
   --tensor-parallel-size 1 \
   --max-model-len 131072 \
   --port 8000
+# Two GPUs
 python -m vllm.entrypoints.openai.api_server \
   --model GadflyII/GLM-4.6V-NVFP4 \
   --tensor-parallel-size 2 \