tencent
/

HunyuanVideo-1.5

@@ -42,9 +42,11 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
   <a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
   <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/report/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
   <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
-  <a href="https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
   <a href="./ComfyUI/README.md" target="_blank"><img src=https://img.shields.io/badge/ComfyUI-blue.svg?logo=book height=22px></a>
   <a href="https://github.com/ModelTC/LightX2V" target="_blank"><img src=https://img.shields.io/badge/LightX2V-yellow.svg?logo=book height=22px></a>
 </div>
@@ -56,6 +58,7 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
 ## 🔥🔥🔥 News
 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
 ## 🎥 Demo
@@ -168,6 +171,7 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
   ```bash
   git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
   cd flex-block-attn
   python3 setup.py install
   ```
@@ -191,7 +195,7 @@ Download the pretrained models before generating videos. Detailed instructions a
 ### Prompt Writing Handbook
 Prompt enhancement plays a crucial role in enabling our model to generate high-quality videos. By writing longer and more detailed prompts, the generated video will be significantly improved. We encourage you to craft comprehensive and descriptive prompts to achieve the best possible video quality. we recommend community partners consulting our official guide on how to write effective prompts.
-**Reference:** **[HunyuanVideo-1.5 Prompt Handbook](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk)**
 ### System Prompts for Automatic Prompt Enhancement
 For users seeking to optimize prompts for other large models, it is recommended to consult the definition of `t2v_rewrite_system_prompt` in the file `hyvideo/utils/rewrite/t2v_prompt.py` to guide text-to-video rewriting. Similarly, for image-to-video rewriting, refer to the definition of `i2v_rewrite_system_prompt` in `hyvideo/utils/rewrite/i2v_prompt.py`.
@@ -229,9 +233,10 @@ OUTPUT_PATH=./outputs/output.mp4
 N_INFERENCE_GPU=8 # Parallel inference GPU count
 CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
 SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
-SAGE_ATTN=false # Inference with SageAttention
 REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
 OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
 MODEL_PATH=ckpts # Path to pretrained model
 torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
@@ -243,6 +248,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
   --cfg_distilled $CFG_DISTILLED \
   --sparse_attn $SPARSE_ATTN \
   --use_sageattn $SAGE_ATTN \
   --rewrite $REWRITE \
   --output_path $OUTPUT_PATH \
   --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
@@ -287,6 +293,11 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 | `--use_sageattn` | bool | No | `false` | Enable SageAttention (use `--use_sageattn` or `--use_sageattn true/1` to enable, `--use_sageattn false/0` to disable) |
 | `--sage_blocks_range` | str | No | `0-53` | SageAttention blocks range (e.g., `0-5` or `0,1,2,3,4,5`) |
 | `--enable_torch_compile` | bool | No | `false` | Enable torch compile for transformer (use `--enable_torch_compile` or `--enable_torch_compile true/1` to enable, `--enable_torch_compile false/0` to disable) |
 **Note:** Use `--nproc_per_node` to specify the number of GPUs. For example, `--nproc_per_node=8` uses 8 GPUs.

   <a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
   <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/report/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
   <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
+  <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5_Prompt_Handbook_EN.md" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
   <a href="./ComfyUI/README.md" target="_blank"><img src=https://img.shields.io/badge/ComfyUI-blue.svg?logo=book height=22px></a>
   <a href="https://github.com/ModelTC/LightX2V" target="_blank"><img src=https://img.shields.io/badge/LightX2V-yellow.svg?logo=book height=22px></a>
+  <a href="https://tusi.cn/models/933574988890423836" target="_blank"><img src=https://img.shields.io/badge/吐司-purple.svg?logo=book height=22px></a>
+  <a href="https://tensor.art/models/933574988890423836" target="_blank"><img src=https://img.shields.io/badge/TensorArt-cyan.svg?logo=book height=22px></a>
 </div>
 ## 🔥🔥🔥 News
 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
+🚀 Latest: We now support cache inference, achieving approximately 2x speedup! Pull the latest code to experience it.
 ## 🎥 Demo
   ```bash
   git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
   cd flex-block-attn
+  git submodule update --init --recursive
   python3 setup.py install
   ```
 ### Prompt Writing Handbook
 Prompt enhancement plays a crucial role in enabling our model to generate high-quality videos. By writing longer and more detailed prompts, the generated video will be significantly improved. We encourage you to craft comprehensive and descriptive prompts to achieve the best possible video quality. we recommend community partners consulting our official guide on how to write effective prompts.
+**Reference:** **[HunyuanVideo-1.5 Prompt Handbook](https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5_Prompt_Handbook_EN.md)**
 ### System Prompts for Automatic Prompt Enhancement
 For users seeking to optimize prompts for other large models, it is recommended to consult the definition of `t2v_rewrite_system_prompt` in the file `hyvideo/utils/rewrite/t2v_prompt.py` to guide text-to-video rewriting. Similarly, for image-to-video rewriting, refer to the definition of `i2v_rewrite_system_prompt` in `hyvideo/utils/rewrite/i2v_prompt.py`.
 N_INFERENCE_GPU=8 # Parallel inference GPU count
 CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
 SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
+SAGE_ATTN=true # Inference with SageAttention
 REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
 OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
+ENABLE_CACHE=true # Enable feature cache during inference. Significantly speeds up inference.
 MODEL_PATH=ckpts # Path to pretrained model
 torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
   --cfg_distilled $CFG_DISTILLED \
   --sparse_attn $SPARSE_ATTN \
   --use_sageattn $SAGE_ATTN \
+  --enable_cache $ENABLE_CACHE \
   --rewrite $REWRITE \
   --output_path $OUTPUT_PATH \
   --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
 | `--use_sageattn` | bool | No | `false` | Enable SageAttention (use `--use_sageattn` or `--use_sageattn true/1` to enable, `--use_sageattn false/0` to disable) |
 | `--sage_blocks_range` | str | No | `0-53` | SageAttention blocks range (e.g., `0-5` or `0,1,2,3,4,5`) |
 | `--enable_torch_compile` | bool | No | `false` | Enable torch compile for transformer (use `--enable_torch_compile` or `--enable_torch_compile true/1` to enable, `--enable_torch_compile false/0` to disable) |
+| `--enable_cache` | bool | No | `false` | Enable cache for transformer (use `--enable_cache` or `--enable_cache true/1` to enable, `--enable_cache false/0` to disable) |
+| `--cache_start_step` | int | No | `11` | Start step to skip when using cache |
+| `--cache_end_step` | int | No | `45` | End step to skip when using cache |
+| `--total_steps` | int | No | `50` | Total inference steps |
+| `--cache_step_interval` | int | No | `4` | Step interval to skip when using cache |
 **Note:** Use `--nproc_per_node` to specify the number of GPUs. For example, `--nproc_per_node=8` uses 8 GPUs.

README_CN.md CHANGED Viewed

@@ -26,10 +26,11 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型，仅需83亿参数即
   <a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
   <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/report/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
   <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
-  <a href="https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNACVygLxeQjyn4FYS?scode=AJEAIQdfAAoSfXnTj0AAkA-gaeACk" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
   <a href="./ComfyUI/README.md" target="_blank"><img src=https://img.shields.io/badge/ComfyUI-blue.svg?logo=book height=22px></a>
   <a href="https://github.com/ModelTC/LightX2V" target="_blank"><img src=https://img.shields.io/badge/LightX2V-yellow.svg?logo=book height=22px></a>
 </div>
@@ -40,6 +41,7 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型，仅需83亿参数即
 ## 🔥🔥🔥 最新动态
 👋 2025年11月20日: 我们开源了 HunyuanVideo-1.5的代码和推理权重
 ## 🎥 演示视频
 <div align="center">
@@ -151,6 +153,7 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
   ```bash
   git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
   cd flex-block-attn
   python3 setup.py install
   ```
@@ -175,7 +178,7 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
 提示词增强在我们的模型生成高质量视频方面起着至关重要的作用。通过撰写更长、更详细的提示词，生成的视频质量将得到显著改善。我们鼓励您编写全面且描述性的提示词，以获得最佳的视频质量。我们建议社区伙伴参考我们的官方指南，了解如何撰写有效的提示词。
-**参考：** **[HunyuanVideo-1.5 提示词手册](https://doc.weixin.qq.com/doc/w3_AXcAcwZSAGgCNhei2zzNUS8O4mKop?scode=AJEAIQdfAAoE1dhviFAAkA-gaeACk)**
 ### 自动提示词增强的系统提示词
@@ -216,9 +219,10 @@ OUTPUT_PATH=./outputs/output.mp4
 N_INFERENCE_GPU=8 # 并行推理 GPU 数量
 CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理，2倍加速
 SPARSE_ATTN=false # 使用稀疏注意力进行推理（仅 720p 模型配备了稀疏注意力）。请确保 flex-block-attn 已安装
-SAGE_ATTN=false # 使用 SageAttention 进行推理
 REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
 OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效，会显著增加 CPU 内存占用，但能够提速
 MODEL_PATH=ckpts # 预训练模型路径
 torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
@@ -230,6 +234,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
   --cfg_distilled $CFG_DISTILLED \
   --sparse_attn $SPARSE_ATTN \
   --use_sageattn $SAGE_ATTN \
   --rewrite $REWRITE \
   --output_path $OUTPUT_PATH \
   --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
@@ -273,6 +278,11 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 | `--use_sageattn` | bool | 否 | `false` | 启用 SageAttention（使用 `--use_sageattn` 或 `--use_sageattn true/1` 来启用，`--use_sageattn false/0` 来禁用） |
 | `--sage_blocks_range` | str | 否 | `0-53` | SageAttention 块范围（例如：`0-5` 或 `0,1,2,3,4,5`） |
 | `--enable_torch_compile` | bool | 否 | `false` | 启用 torch compile 以优化 transformer（使用 `--enable_torch_compile` 或 `--enable_torch_compile true/1` 来启用，`--enable_torch_compile false/0` 来禁用） |
 **注意：** 使用 `--nproc_per_node` 指定使用的 GPU 数量。例如，`--nproc_per_node=8` 表示使用 8 个 GPU。

   <a href=https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5 target="_blank"><img src= https://img.shields.io/badge/Page-bb8a2e.svg?logo=github height=22px></a>
   <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/report/HunyuanVideo_1_5.pdf" target="_blank"><img src=https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv height=22px></a>
   <a href=https://x.com/TencentHunyuan target="_blank"><img src=https://img.shields.io/badge/Hunyuan-black.svg?logo=x height=22px></a>
+  <a href="https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5_Prompt_Handbook_EN.md" target="_blank"><img src=https://img.shields.io/badge/📚-PromptHandBook-blue.svg?logo=book height=22px></a> <br/>
   <a href="./ComfyUI/README.md" target="_blank"><img src=https://img.shields.io/badge/ComfyUI-blue.svg?logo=book height=22px></a>
   <a href="https://github.com/ModelTC/LightX2V" target="_blank"><img src=https://img.shields.io/badge/LightX2V-yellow.svg?logo=book height=22px></a>
+  <a href="https://tusi.cn/models/933574988890423836" target="_blank"><img src=https://img.shields.io/badge/吐司-purple.svg?logo=book height=22px></a>
+  <a href="https://tensor.art/models/933574988890423836" target="_blank"><img src=https://img.shields.io/badge/TensorArt-cyan.svg?logo=book height=22px></a>
 </div>
 ## 🔥🔥🔥 最新动态
 👋 2025年11月20日: 我们开源了 HunyuanVideo-1.5的代码和推理权重
+🚀 最新: 我们现已支持 cache 推理，可实现约两倍加速！请 pull 最新代码体验。
 ## 🎥 演示视频
 <div align="center">
   ```bash
   git clone https://github.com/Tencent-Hunyuan/flex-block-attn.git
   cd flex-block-attn
+  git submodule update --init --recursive
   python3 setup.py install
   ```
 提示词增强在我们的模型生成高质量视频方面起着至关重要的作用。通过撰写更长、更详细的提示词，生成的视频质量将得到显著改善。我们鼓励您编写全面且描述性的提示词，以获得最佳的视频质量。我们建议社区伙伴参考我们的官方指南，了解如何撰写有效的提示词。
+**参考：** **[HunyuanVideo-1.5 提示词手册](https://github.com/Tencent-Hunyuan/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5_Prompt_Handbook_EN.md)**
 ### 自动提示词增强的系统提示词
 N_INFERENCE_GPU=8 # 并行推理 GPU 数量
 CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理，2倍加速
 SPARSE_ATTN=false # 使用稀疏注意力进行推理（仅 720p 模型配备了稀疏注意力）。请确保 flex-block-attn 已安装
+SAGE_ATTN=true # 使用 SageAttention 进行推理
 REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
 OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效，会显著增加 CPU 内存占用，但能够提速
+ENABLE_CACHE=true # 启用特征缓存进行推理。显著提升推理速度
 MODEL_PATH=ckpts # 预训练模型路径
 torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
   --cfg_distilled $CFG_DISTILLED \
   --sparse_attn $SPARSE_ATTN \
   --use_sageattn $SAGE_ATTN \
+  --enable_cache $ENABLE_CACHE \
   --rewrite $REWRITE \
   --output_path $OUTPUT_PATH \
   --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
 | `--use_sageattn` | bool | 否 | `false` | 启用 SageAttention（使用 `--use_sageattn` 或 `--use_sageattn true/1` 来启用，`--use_sageattn false/0` 来禁用） |
 | `--sage_blocks_range` | str | 否 | `0-53` | SageAttention 块范围（例如：`0-5` 或 `0,1,2,3,4,5`） |
 | `--enable_torch_compile` | bool | 否 | `false` | 启用 torch compile 以优化 transformer（使用 `--enable_torch_compile` 或 `--enable_torch_compile true/1` 来启用，`--enable_torch_compile false/0` 来禁用） |
+| `--enable_cache` | bool | 否 | `false` | 启用 transformer 缓存（使用 `--enable_cache` 或 `--enable_cache true/1` 来启用，`--enable_cache false/0` 来禁用） |
+| `--cache_start_step` | int | 否 | `11` | 使用缓存时跳过的起始步数 |
+| `--cache_end_step` | int | 否 | `45` | 使用缓存时跳过的结束步数 |
+| `--total_steps` | int | 否 | `50` | 总推理步数 |
+| `--cache_step_interval` | int | 否 | `4` | 使用缓存时跳过的步数间隔 |
 **注意：** 使用 `--nproc_per_node` 指定使用的 GPU 数量。例如，`--nproc_per_node=8` 表示使用 8 个 GPU。