|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- diffusion-single-file |
|
|
- comfyui |
|
|
- distillation |
|
|
- video |
|
|
- video genration |
|
|
base_model: |
|
|
- tencent/HunyuanVideo-1.5 |
|
|
pipeline_tags: |
|
|
- image-to-video |
|
|
- text-to-video |
|
|
library_name: diffusers |
|
|
pipeline_tag: image-to-video |
|
|
--- |
|
|
|
|
|
# π¬ Hy1.5-Quantized-Models |
|
|
|
|
|
<img src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/img_lightx2v.png" width="75%" /> |
|
|
|
|
|
--- |
|
|
|
|
|
π€ [HuggingFace](https://huggingface.co/lightx2v/Hy1.5-Quantized-Models) | [GitHub](https://github.com/ModelTC/LightX2V) | [License](https://opensource.org/licenses/Apache-2.0) |
|
|
|
|
|
--- |
|
|
|
|
|
This repository contains quantized models for HunyuanVideo-1.5 optimized for use with LightX2V. These quantized models significantly reduce memory usage while maintaining high-quality video generation performance. |
|
|
|
|
|
## π Model List |
|
|
|
|
|
### DIT (Diffusion Transformer) Models |
|
|
|
|
|
* **`hy15_720p_i2v_fp8_e4m3_lightx2v.safetensors`** - 720p Image-to-Video quantized DIT model |
|
|
* **`hy15_720p_t2v_fp8_e4m3_lightx2v.safetensors`** - 720p Text-to-Video quantized DIT model |
|
|
|
|
|
### Encoder Models |
|
|
|
|
|
* **`hy15_qwen25vl_llm_encoder_fp8_e4m3_lightx2v.safetensors`** - Quantized text encoder (Qwen2.5-VL LLM Encoder) |
|
|
|
|
|
## π Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
First, install LightX2V: |
|
|
|
|
|
```bash |
|
|
pip install -v git+https://github.com/ModelTC/LightX2V.git |
|
|
``` |
|
|
|
|
|
Or build from source: |
|
|
|
|
|
```bash |
|
|
git clone https://github.com/ModelTC/LightX2V.git |
|
|
cd LightX2V |
|
|
pip install -v -e . |
|
|
``` |
|
|
|
|
|
### Download Models |
|
|
|
|
|
Download the quantized models from this repository: |
|
|
|
|
|
```bash |
|
|
# Using git-lfs |
|
|
git lfs install |
|
|
git clone https://huggingface.co/lightx2v/Hy1.5-Quantized-Models |
|
|
|
|
|
# Or download individual files using huggingface-hub |
|
|
pip install huggingface-hub |
|
|
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='lightx2v/Hy1.5-Quantized-Models', filename='hy15_720p_i2v_fp8_e4m3_lightx2v.safetensors', local_dir='./models')" |
|
|
``` |
|
|
|
|
|
## π» Usage in LightX2V |
|
|
|
|
|
### Text-to-Video (T2V) Example |
|
|
|
|
|
```python |
|
|
from lightx2v import LightX2VPipeline |
|
|
|
|
|
# Initialize pipeline |
|
|
pipe = LightX2VPipeline( |
|
|
model_path="/path/to/hunyuanvideo-1.5/", # Original model path |
|
|
model_cls="hunyuan_video_1.5", |
|
|
transformer_model_name="720p_t2v", |
|
|
task="t2v", |
|
|
) |
|
|
|
|
|
# Enable quantization |
|
|
pipe.enable_quantize( |
|
|
quant_scheme='fp8-sgl', |
|
|
dit_quantized=True, |
|
|
dit_quantized_ckpt="/path/to/hy15_720p_t2v_fp8_e4m3_lightx2v.safetensors", |
|
|
text_encoder_quantized=True, |
|
|
text_encoder_quantized_ckpt="/path/to/hy15_qwen25vl_llm_encoder_fp8_e4m3_lightx2v.safetensors", |
|
|
image_encoder_quantized=False, |
|
|
) |
|
|
|
|
|
# Optional: Enable offloading for lower VRAM usage |
|
|
pipe.enable_offload( |
|
|
cpu_offload=True, |
|
|
offload_granularity="block", # For HunyuanVideo-1.5, only "block" is supported |
|
|
text_encoder_offload=True, |
|
|
image_encoder_offload=False, |
|
|
vae_offload=False, |
|
|
) |
|
|
|
|
|
# Optional: Use lighttae |
|
|
pipe.enable_lightvae( |
|
|
use_tae=True, |
|
|
tae_path="/path/to/lighttaehy1_5.safetensors", |
|
|
use_lightvae=False, |
|
|
vae_path=None, |
|
|
) |
|
|
|
|
|
# Create generator |
|
|
pipe.create_generator( |
|
|
attn_mode="sage_attn2", |
|
|
infer_steps=50, |
|
|
num_frames=121, |
|
|
guidance_scale=6.0, |
|
|
sample_shift=9.0, |
|
|
aspect_ratio="16:9", |
|
|
fps=24, |
|
|
) |
|
|
|
|
|
# Generate video |
|
|
seed = 123 |
|
|
prompt = "A beautiful sunset over the ocean with waves gently crashing on the shore." |
|
|
negative_prompt = "" |
|
|
save_result_path="/path/to/output.mp4" |
|
|
|
|
|
pipe.generate( |
|
|
seed=seed, |
|
|
prompt=prompt, |
|
|
negative_prompt=negative_prompt, |
|
|
save_result_path=save_result_path, |
|
|
) |
|
|
``` |
|
|
|
|
|
### Image-to-Video (I2V) Example |
|
|
|
|
|
```python |
|
|
from lightx2v import LightX2VPipeline |
|
|
|
|
|
# Initialize pipeline |
|
|
pipe = LightX2VPipeline( |
|
|
model_path="/path/to/hunyuanvideo-1.5/", # Original model path |
|
|
model_cls="hunyuan_video_1.5", |
|
|
transformer_model_name="720p_i2v", |
|
|
task="i2v", |
|
|
) |
|
|
|
|
|
# Enable quantization |
|
|
pipe.enable_quantize( |
|
|
quant_scheme='fp8-sgl', |
|
|
dit_quantized=True, |
|
|
dit_quantized_ckpt="/path/to/hy15_720p_i2v_fp8_e4m3_lightx2v.safetensors", |
|
|
text_encoder_quantized=True, |
|
|
text_encoder_quantized_ckpt="/path/to/hy15_qwen25vl_llm_encoder_fp8_e4m3_lightx2v.safetensors", |
|
|
image_encoder_quantized=False, |
|
|
) |
|
|
|
|
|
# Optional: Use lighttae |
|
|
pipe.enable_lightvae( |
|
|
use_tae=True, |
|
|
tae_path="/path/to/lighttaehy1_5.safetensors", |
|
|
use_lightvae=False, |
|
|
vae_path=None, |
|
|
) |
|
|
|
|
|
# Optional: Enable offloading for lower VRAM usage |
|
|
pipe.enable_offload( |
|
|
cpu_offload=True, |
|
|
offload_granularity="block", |
|
|
text_encoder_offload=True, |
|
|
image_encoder_offload=False, |
|
|
vae_offload=False, |
|
|
) |
|
|
|
|
|
# Create generator |
|
|
pipe.create_generator( |
|
|
attn_mode="sage_attn2", |
|
|
infer_steps=50, |
|
|
num_frames=121, |
|
|
guidance_scale=6.0, |
|
|
sample_shift=7.0, |
|
|
fps=24, |
|
|
) |
|
|
|
|
|
# Generate video |
|
|
seed = 42 |
|
|
prompt = "The image comes to life with smooth motion and natural transitions." |
|
|
negative_prompt = "" |
|
|
save_result_path="/path/to/output.mp4" |
|
|
|
|
|
pipe.generate( |
|
|
seed=seed, |
|
|
image_path="/path/to/input_image.jpg", |
|
|
prompt=prompt, |
|
|
negative_prompt=negative_prompt, |
|
|
save_result_path=save_result_path, |
|
|
) |
|
|
``` |
|
|
|
|
|
## βοΈ Quantization Scheme |
|
|
|
|
|
These models use **FP8-E4M3** quantization with the **SGL (SGLang) kernel** scheme (`fp8-sgl`). This quantization format provides: |
|
|
|
|
|
* **Significant memory reduction**: Up to 50% reduction in VRAM usage |
|
|
* **Maintained quality**: Minimal quality degradation compared to full precision models |
|
|
* **Faster inference**: Optimized kernels for accelerated computation |
|
|
|
|
|
### Requirements |
|
|
|
|
|
To use these quantized models, you need to install the SGL kernel: |
|
|
|
|
|
```bash |
|
|
# Requires torch == 2.8.0 |
|
|
pip install sgl-kernel --upgrade |
|
|
``` |
|
|
|
|
|
Alternatively, you can use VLLM kernels: |
|
|
|
|
|
```bash |
|
|
pip install vllm |
|
|
``` |
|
|
|
|
|
For more details on quantization schemes, please refer to the [LightX2V Quantization Documentation](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/quantization.html). |
|
|
|
|
|
## π Performance Benefits |
|
|
|
|
|
Using quantized models provides: |
|
|
|
|
|
* **Lower VRAM Requirements**: Enables running on GPUs with less memory (e.g., RTX 4090 24GB) |
|
|
* **Faster Inference**: Optimized quantized kernels accelerate computation |
|
|
* **Quality Preservation**: FP8 quantization maintains high visual quality |
|
|
|
|
|
## π Related Resources |
|
|
|
|
|
* [LightX2V GitHub Repository](https://github.com/ModelTC/LightX2V) |
|
|
* [LightX2V Documentation](https://lightx2v-en.readthedocs.io/en/latest/) |
|
|
* [HunyuanVideo-1.5 Original Model](https://huggingface.co/tencent/HunyuanVideo-1.5) |
|
|
* [LightX2V Examples](https://github.com/ModelTC/LightX2V/tree/main/examples) |
|
|
|
|
|
## π Notes |
|
|
|
|
|
* **Important**: All advanced configurations (including `enable_quantize()`) must be called **before** `create_generator()`, otherwise they will not take effect. |
|
|
* The original HunyuanVideo-1.5 model weights are still required. These quantized models are used in conjunction with the original model structure. |
|
|
* For best performance, we recommend using SageAttention 2 (`sage_attn2`) as the attention mode. |
|
|
|
|
|
## π€ Citation |
|
|
|
|
|
If you use these quantized models in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@misc{lightx2v, |
|
|
author = {LightX2V Contributors}, |
|
|
title = {LightX2V: Light Video Generation Inference Framework}, |
|
|
year = {2025}, |
|
|
publisher = {GitHub}, |
|
|
journal = {GitHub repository}, |
|
|
howpublished = {\url{https://github.com/ModelTC/lightx2v}}, |
|
|
} |
|
|
``` |
|
|
|
|
|
## π License |
|
|
|
|
|
This model is released under the Apache 2.0 License, same as the original HunyuanVideo-1.5 model. |
|
|
|