File size: 9,434 Bytes
4967ca8
 
 
 
 
 
 
 
d55acff
 
4967ca8
 
 
 
 
ce7b5fa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
---
license: apache-2.0
tags:
- diffusion-single-file
- comfyui
- distillation
- video
- video genration
base_model:
  - tencent/HunyuanVideo-1.5
pipeline_tags:
- text-to-video
library_name: diffusers
pipeline_tag: text-to-video
---

# 🎬 Hy1.5-Distill-Models

<img src="https://raw.githubusercontent.com/ModelTC/LightX2V/main/assets/img_lightx2v.png" width="75%" />

---

πŸ€— [HuggingFace](https://huggingface.co/lightx2v/Hy1.5-Distill-Models) | [GitHub](https://github.com/ModelTC/LightX2V) | [License](https://opensource.org/licenses/Apache-2.0)

---

This repository contains 4-step distilled models for HunyuanVideo-1.5 optimized for use with LightX2V. These distilled models enable **ultra-fast 4-step inference** without CFG (Classifier-Free Guidance), significantly reducing generation time while maintaining high-quality video output.

## πŸ“‹ Model List

### 4-Step Distilled Models

* **`hy1.5_t2v_480p_lightx2v_4step.safetensors`** - 480p Text-to-Video 4-step distilled model (16.7 GB)
* **`hy1.5_t2v_480p_scaled_fp8_e4m3_lightx2v_4step.safetensors`** - 480p Text-to-Video 4-step distilled model with FP8 quantization (8.85 GB)

## πŸš€ Quick Start

### Installation

First, install LightX2V:

```bash
pip install -v git+https://github.com/ModelTC/LightX2V.git
```

Or build from source:

```bash
git clone https://github.com/ModelTC/LightX2V.git
cd LightX2V
pip install -v -e .
```

### Download Models

Download the distilled models from this repository:

```bash
# Using git-lfs
git lfs install
git clone https://huggingface.co/lightx2v/Hy1.5-Distill-Models

# Or download individual files using huggingface-hub
pip install huggingface-hub
python -c "from huggingface_hub import hf_hub_download; hf_hub_download(repo_id='lightx2v/Hy1.5-Distill-Models', filename='hy1.5_t2v_480p_lightx2v_4step.safetensors', local_dir='./models')"
```

## πŸ’» Usage in LightX2V

### 4-Step Distilled Model (Base Version)

```python
"""
HunyuanVideo-1.5 text-to-video generation example.
This example demonstrates how to use LightX2V with HunyuanVideo-1.5 4-step distilled model for T2V generation.
"""

from lightx2v import LightX2VPipeline

# Initialize pipeline for HunyuanVideo-1.5
pipe = LightX2VPipeline(
    model_path="/path/to/hunyuanvideo-1.5/",  # Original model path
    model_cls="hunyuan_video_1.5",
    transformer_model_name="480p_t2v",
    task="t2v",
    # 4-step distilled model ckpt
    dit_original_ckpt="/path/to/hy1.5_t2v_480p_lightx2v_4step.safetensors"
)

# Alternative: create generator from config JSON file
# pipe.create_generator(config_json="../configs/hunyuan_video_15/hunyuan_video_t2v_480p.json")

# Enable offloading to significantly reduce VRAM usage with minimal speed impact
# Suitable for RTX 30/40/50 consumer GPUs
pipe.enable_offload(
    cpu_offload=True,
    offload_granularity="block",  # For HunyuanVideo-1.5, only "block" is supported
    text_encoder_offload=True,
    image_encoder_offload=False,
    vae_offload=False,
)

# Optional: Use lighttae
# pipe.enable_lightvae(
#     use_tae=True,
#     tae_path="/path/to/lighttaehy1_5.safetensors",
#     use_lightvae=False,
#     vae_path=None,
# )

# Create generator with specified parameters
# Note: 4-step distillation requires infer_steps=4, guidance_scale=1, and denoising_step_list
pipe.create_generator(
    attn_mode="sage_attn2",
    infer_steps=4,  # 4-step inference
    num_frames=81,
    guidance_scale=1,  # No CFG needed for distilled models
    sample_shift=9.0,
    aspect_ratio="16:9",
    fps=16,
    denoising_step_list=[1000, 750, 500, 250]  # Required for 4-step distillation
)

# Generation parameters
seed = 123
prompt = "A close-up shot captures a scene on a polished, light-colored granite kitchen counter, illuminated by soft natural light from an unseen window. Initially, the frame focuses on a tall, clear glass filled with golden, translucent apple juice standing next to a single, shiny red apple with a green leaf still attached to its stem. The camera moves horizontally to the right. As the shot progresses, a white ceramic plate smoothly enters the frame, revealing a fresh arrangement of about seven or eight more apples, a mix of vibrant reds and greens, piled neatly upon it. A shallow depth of field keeps the focus sharply on the fruit and glass, while the kitchen backsplash in the background remains softly blurred. The scene is in a realistic style."
negative_prompt = ""
save_result_path = "/path/to/save_results/output.mp4"

# Generate video
pipe.generate(
    seed=seed,
    prompt=prompt,
    negative_prompt=negative_prompt,
    save_result_path=save_result_path,
)
```

### 4-Step Distilled Model with FP8 Quantization

For even lower memory usage, use the FP8 quantized version:

```python
from lightx2v import LightX2VPipeline

# Initialize pipeline
pipe = LightX2VPipeline(
    model_path="/path/to/hunyuanvideo-1.5/",  # Original model path
    model_cls="hunyuan_video_1.5",
    transformer_model_name="480p_t2v",
    task="t2v",
    # 4-step distilled model ckpt
    dit_original_ckpt="/path/to/hy1.5_t2v_480p_lightx2v_4step.safetensors"
)

# Enable FP8 quantization for the distilled model
pipe.enable_quantize(
    quant_scheme='fp8-sgl',
    dit_quantized=True,
    dit_quantized_ckpt="/path/to/hy1.5_t2v_480p_scaled_fp8_e4m3_lightx2v_4step.safetensors",
    text_encoder_quantized=False,  # Optional: can also quantize text encoder
    text_encoder_quantized_ckpt="/path/to/hy15_qwen25vl_llm_encoder_fp8_e4m3_lightx2v.safetensors",  # Optional
    image_encoder_quantized=False,
)

# Enable offloading for lower VRAM usage
pipe.enable_offload(
    cpu_offload=True,
    offload_granularity="block",
    text_encoder_offload=True,
    image_encoder_offload=False,
    vae_offload=False,
)

# Create generator
pipe.create_generator(
    attn_mode="sage_attn2",
    infer_steps=4,
    num_frames=81,
    guidance_scale=1,
    sample_shift=9.0,
    aspect_ratio="16:9",
    fps=16,
    denoising_step_list=[1000, 750, 500, 250]
)

# Generate video
pipe.generate(
    seed=123,
    prompt="Your prompt here",
    negative_prompt="",
    save_result_path="/path/to/output.mp4",
)
```

## βš™οΈ Key Features

### 4-Step Distillation

These models use **step distillation** technology to compress the original 50-step inference process into just **4 steps**, providing:

* **πŸš€ Ultra-Fast Inference**: Generate videos in a fraction of the time
* **πŸ’‘ No CFG Required**: Set `guidance_scale=1` (no classifier-free guidance needed)
* **πŸ“Š Quality Preservation**: Maintains high visual quality despite fewer steps
* **πŸ’Ύ Lower Memory**: Reduced computational requirements

### FP8 Quantization (Optional)

The FP8 quantized version (`hy1.5_t2v_480p_scaled_fp8_e4m3_lightx2v_4step.safetensors`) provides additional benefits:

* **50% Memory Reduction**: Further reduces VRAM usage
* **Faster Computation**: Optimized quantized kernels
* **Maintained Quality**: FP8 quantization preserves visual quality

### Requirements

For FP8 quantized models, you need to install the SGL kernel:

```bash
# Requires torch == 2.8.0
pip install sgl-kernel --upgrade
```

Alternatively, you can use VLLM kernels:

```bash
pip install vllm
```

## πŸ“Š Performance Benefits

Using 4-step distilled models provides:

* **~25x Speedup**: Compared to standard 50-step inference
* **Lower VRAM Requirements**: Enables running on GPUs with less memory
* **No CFG Overhead**: Eliminates the need for classifier-free guidance computation
* **Production Ready**: Fast enough for real-time or near-real-time applications

## πŸ”— Related Resources

* [LightX2V GitHub Repository](https://github.com/ModelTC/LightX2V)
* [LightX2V Documentation](https://lightx2v-en.readthedocs.io/en/latest/)
* [HunyuanVideo-1.5 Original Model](https://huggingface.co/tencent/HunyuanVideo-1.5)
* [Hy1.5-Quantized-Models](https://huggingface.co/lightx2v/Hy1.5-Quantized-Models) - For quantized inference without distillation
* [LightX2V Examples](https://github.com/ModelTC/LightX2V/tree/main/examples)
* [Step Distillation Documentation](https://lightx2v-en.readthedocs.io/en/latest/method_tutorials/step_distill.html)

## πŸ“ Important Notes

* **Critical Configuration**: 
  - Must set `infer_steps=4` (not the default 50)
  - Must set `guidance_scale=1` (CFG is not used in distilled models)
  - Must provide `denoising_step_list=[1000, 750, 500, 250]`
  
* **Model Loading**: All advanced configurations (including `enable_quantize()` and `enable_offload()`) must be called **before** `create_generator()`, otherwise they will not take effect.

* **Original Model Required**: The original HunyuanVideo-1.5 model weights are still required. The distilled model is used in conjunction with the original model structure.

* **Attention Mode**: For best performance, we recommend using SageAttention 2 (`sage_attn2`) as the attention mode.

* **Resolution**: Currently supports 480p resolution. Higher resolutions may be available in future releases.

## 🀝 Citation

If you use these distilled models in your research, please cite:

```bibtex
@misc{lightx2v,
  author = {LightX2V Contributors},
  title = {LightX2V: Light Video Generation Inference Framework},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ModelTC/lightx2v}},
}
```

## πŸ“„ License

This model is released under the Apache 2.0 License, same as the original HunyuanVideo-1.5 model.