nvidia
/

Wan2.2-T2V-A14B-Diffusers-FP8

Model Optimizer

WanTransformer3DModel

Model card Files Files and versions

jingyux-nv commited on 8 days ago

Commit

e687a84

·

verified ·

1 Parent(s): de6d036

Update README.md

Files changed (1) hide show

README.md +29 -2

README.md CHANGED Viewed

@@ -61,7 +61,7 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
 ## Software Integration:
 **Supported Runtime Engine(s):** <br>
-* TRTLLM <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * NVIDIA Blackwell <br>
@@ -101,7 +101,7 @@ The model is quantized with nvidia-modelopt **v0.42.0**  <br>
 ## Inference:
-**Acceleration Engine:** TRTLLM <br>
 **Test Hardware:** B200 <br>
 ## Post Training Quantization
@@ -112,7 +112,34 @@ This model was obtained by quantizing the weights and activations of Wan2.2-T2V-
 To serve this checkpoint with [TRTLLM](https://github.com/NVIDIA/TensorRT-LLM):
 ```sh
 trtllm-serve nvidia/Wan2.2-T2V-A14B-Diffusers-FP8 --extra_visual_gen_options ./examples/visual_gen/serve/configs/wan.yml
 ```
 ### Model Characteristics

 ## Software Integration:
 **Supported Runtime Engine(s):** <br>
+* TRTLLM,SGLang <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 * NVIDIA Blackwell <br>
 ## Inference:
+**Acceleration Engine:** TRTLLM,SGLang <br>
 **Test Hardware:** B200 <br>
 ## Post Training Quantization
 To serve this checkpoint with [TRTLLM](https://github.com/NVIDIA/TensorRT-LLM):
 ```sh
+# TRTLLM
 trtllm-serve nvidia/Wan2.2-T2V-A14B-Diffusers-FP8 --extra_visual_gen_options ./examples/visual_gen/serve/configs/wan.yml
+# SGLang
+PROMPT='A cat and a dog baking a cake together in a cozy kitchen. The cat carefully measures flour while the dog stirs batter in a glass bowl, sunlight through the window, smooth cinematic camera motion.'
+FLASHINFER_DISABLE_VERSION_CHECK=1
+python -m sglang.multimodal_gen.runtime.entrypoints.cli.main generate
+--model-path nvidia/Wan2.2-T2V-A14B-Diffusers-FP8
+--backend sglang
+--attention-backend torch_sdpa
+--performance-mode speed
+--dit-cpu-offload false
+--dit-layerwise-offload false
+--text-encoder-cpu-offload false
+--image-encoder-cpu-offload false
+--vae-cpu-offload false
+--pin-cpu-memory false
+--width 832
+--height 480
+--num-frames 81
+--fps 16
+--num-inference-steps 50
+--guidance-scale 5.0
+--seed 0
+--warmup false
+--prompt "$PROMPT"
 ```
 ### Model Characteristics