jingyux-nv commited on
Commit
e687a84
·
verified ·
1 Parent(s): de6d036

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +29 -2
README.md CHANGED
@@ -61,7 +61,7 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
61
 
62
  ## Software Integration:
63
  **Supported Runtime Engine(s):** <br>
64
- * TRTLLM <br>
65
 
66
  **Supported Hardware Microarchitecture Compatibility:** <br>
67
  * NVIDIA Blackwell <br>
@@ -101,7 +101,7 @@ The model is quantized with nvidia-modelopt **v0.42.0** <br>
101
 
102
 
103
  ## Inference:
104
- **Acceleration Engine:** TRTLLM <br>
105
  **Test Hardware:** B200 <br>
106
 
107
  ## Post Training Quantization
@@ -112,7 +112,34 @@ This model was obtained by quantizing the weights and activations of Wan2.2-T2V-
112
  To serve this checkpoint with [TRTLLM](https://github.com/NVIDIA/TensorRT-LLM):
113
 
114
  ```sh
 
115
  trtllm-serve nvidia/Wan2.2-T2V-A14B-Diffusers-FP8 --extra_visual_gen_options ./examples/visual_gen/serve/configs/wan.yml
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
116
  ```
117
 
118
  ### Model Characteristics
 
61
 
62
  ## Software Integration:
63
  **Supported Runtime Engine(s):** <br>
64
+ * TRTLLM,SGLang <br>
65
 
66
  **Supported Hardware Microarchitecture Compatibility:** <br>
67
  * NVIDIA Blackwell <br>
 
101
 
102
 
103
  ## Inference:
104
+ **Acceleration Engine:** TRTLLM,SGLang <br>
105
  **Test Hardware:** B200 <br>
106
 
107
  ## Post Training Quantization
 
112
  To serve this checkpoint with [TRTLLM](https://github.com/NVIDIA/TensorRT-LLM):
113
 
114
  ```sh
115
+ # TRTLLM
116
  trtllm-serve nvidia/Wan2.2-T2V-A14B-Diffusers-FP8 --extra_visual_gen_options ./examples/visual_gen/serve/configs/wan.yml
117
+
118
+ # SGLang
119
+
120
+ PROMPT='A cat and a dog baking a cake together in a cozy kitchen. The cat carefully measures flour while the dog stirs batter in a glass bowl, sunlight through the window, smooth cinematic camera motion.'
121
+
122
+ FLASHINFER_DISABLE_VERSION_CHECK=1
123
+ python -m sglang.multimodal_gen.runtime.entrypoints.cli.main generate
124
+ --model-path nvidia/Wan2.2-T2V-A14B-Diffusers-FP8
125
+ --backend sglang
126
+ --attention-backend torch_sdpa
127
+ --performance-mode speed
128
+ --dit-cpu-offload false
129
+ --dit-layerwise-offload false
130
+ --text-encoder-cpu-offload false
131
+ --image-encoder-cpu-offload false
132
+ --vae-cpu-offload false
133
+ --pin-cpu-memory false
134
+ --width 832
135
+ --height 480
136
+ --num-frames 81
137
+ --fps 16
138
+ --num-inference-steps 50
139
+ --guidance-scale 5.0
140
+ --seed 0
141
+ --warmup false
142
+ --prompt "$PROMPT"
143
  ```
144
 
145
  ### Model Characteristics