Spaces:
Runtime error
Runtime error
| <!-- ## **HunyuanVideo** --> | |
| [English](./README.md) | |
| <p align="center"> | |
| <img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo/refs/heads/main/assets/logo.png" height=100> | |
| </p> | |
| # HunyuanVideo: A Systematic Framework For Large Video Generation Model | |
| <div align="center"> | |
| <a href="https://github.com/Tencent/HunyuanVideo"><img src="https://img.shields.io/static/v1?label=HunyuanVideo Code&message=Github&color=blue"></a>   | |
| <a href="https://aivideo.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Web&color=green"></a>   | |
| <a href="https://video.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Playground&message=Web&color=green"></a> | |
| </div> | |
| <div align="center"> | |
| <a href="https://arxiv.org/abs/2412.03603"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv&color=red"></a>   | |
| <a href="https://aivideo.hunyuan.tencent.com/hunyuanvideo.pdf"><img src="https://img.shields.io/static/v1?label=Tech Report&message=High-Quality Version (~350M)&color=red"></a> | |
| </div> | |
| <div align="center"> | |
| <a href="https://huggingface.co/tencent/HunyuanVideo"><img src="https://img.shields.io/static/v1?label=HunyuanVideo&message=HuggingFace&color=yellow"></a>   | |
| <a href="https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video"><img src="https://img.shields.io/static/v1?label=HunyuanVideo&message=Diffusers&color=yellow"></a>   | |
| <a href="https://huggingface.co/tencent/HunyuanVideo-PromptRewrite"><img src="https://img.shields.io/static/v1?label=HunyuanVideo-PromptRewrite&message=HuggingFace&color=yellow"></a> | |
| [](https://replicate.com/zsxkib/hunyuan-video) | |
| </div> | |
| <p align="center"> | |
| ๐ ๅ ๅ ฅๆไปฌ็ <a href="assets/WECHAT.md" target="_blank">WeChat</a> ๅ <a href="https://discord.gg/GpARqvrh" target="_blank">Discord</a> | |
| </p> | |
| ----- | |
| ๆฌไปๅบๅ ๅซไบ HunyuanVideo ้กน็ฎ็ PyTorch ๆจกๅๅฎไนใ้ข่ฎญ็ปๆ้ๅๆจ็/้ๆ ทไปฃ็ ใๅ่ๆไปฌ็้กน็ฎ้กต้ข [project page](https://aivideo.hunyuan.tencent.com) ๆฅ็ๆดๅคๅ ๅฎนใ | |
| > [**HunyuanVideo: A Systematic Framework For Large Video Generation Model**](https://arxiv.org/abs/2412.03603) <br> | |
| ## ๐ฅ๐ฅ๐ฅ ๆดๆฐ!! | |
| * 2024ๅนด12ๆ18ๆฅ: ๐โโ๏ธ ๅผๆบ HunyuanVideo [FP8 ๆจกๅๆ้](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt)๏ผ่็ๆดๅค GPU ๆพๅญใ | |
| * 2024ๅนด12ๆ17ๆฅ: ๐ค HunyuanVideoๅทฒ็ป้ๆๅฐ[Diffusers](https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video)ไธญใ | |
| * 2024ๅนด12ๆ03ๆฅ: ๐ ๅผๆบ HunyuanVideo ๅคๅกๅนถ่กๆจ็ไปฃ็ ๏ผ็ฑ[xDiT](https://github.com/xdit-project/xDiT)ๆไพใ | |
| * 2024ๅนด12ๆ03ๆฅ: ๐ ๅผๆบ HunyuanVideo ๆ็่ง้ข็ๆจ็ไปฃ็ ๅๆจกๅๆ้ใ | |
| ## ๐ฅ ไฝๅๅฑ็คบ | |
| <div align="center"> | |
| <video width="70%" src="https://github.com/user-attachments/assets/22440764-0d7e-438e-a44d-d0dad1006d3d" poster="./assets/video_poster.png"> </video> | |
| </div> | |
| ## ๐งฉ ็คพๅบ่ดก็ฎ | |
| ๅฆๆๆจ็้กน็ฎไธญๆๅผๅๆไฝฟ็จ HunyuanVideo๏ผๆฌข่ฟๅ็ฅๆไปฌใ | |
| - ComfyUI (ๆฏๆFP8ๆจ็ใV2VๅIP2V็ๆ): [ComfyUI-HunyuanVideoWrapper](https://github.com/kijai/ComfyUI-HunyuanVideoWrapper) by [Kijai](https://github.com/kijai) | |
| - FastVideo (ไธ่ดๆง่ธ้ฆๆจกๅ): [FastVideo](https://github.com/hao-ai-lab/FastVideo) by [Hao AI Lab](https://hao-ai-lab.github.io/) | |
| - HunyuanVideo-gguf (GGUFใ้ๅ): [HunyuanVideo-gguf](https://huggingface.co/city96/HunyuanVideo-gguf) by [city96](https://huggingface.co/city96) | |
| - Enhance-A-Video (็ๆๆด้ซ่ดจ้็่ง้ข): [Enhance-A-Video](https://github.com/NUS-HPC-AI-Lab/Enhance-A-Video) by [NUS-HPC-AI-Lab](https://ai.comp.nus.edu.sg/) | |
| - TeaCache (ๅบไบ็ผๅญ็ๅ ้้ๆ ท): [TeaCache](https://github.com/LiewFeng/TeaCache) by [Feng Liu](https://github.com/LiewFeng) | |
| ## ๐ ๅผๆบ่ฎกๅ | |
| - HunyuanVideo (ๆ็่ง้ขๆจกๅ) | |
| - [x] ๆจ็ไปฃ็ | |
| - [x] ๆจกๅๆ้ | |
| - [x] ๅคGPUๅบๅๅนถ่กๆจ็๏ผGPU ่ถๅค๏ผๆจ็้ๅบฆ่ถๅฟซ๏ผ | |
| - [x] Web Demo (Gradio) | |
| - [x] Diffusers | |
| - [x] FP8 ้ๅ็ๆฌ | |
| - [ ] Penguin Video ๅบๅๆต่ฏ้ | |
| - [ ] ComfyUI | |
| - [ ] ๅคGPU PipeFusionๅนถ่กๆจ็ (ๆดไฝๆพๅญ้ๆฑ) | |
| - HunyuanVideo (ๅพ็่ง้ขๆจกๅ) | |
| - [ ] ๆจ็ไปฃ็ | |
| - [ ] ๆจกๅๆ้ | |
| ## ็ฎๅฝ | |
| - [HunyuanVideo: A Systematic Framework For Large Video Generation Model](#hunyuanvideo-a-systematic-framework-for-large-video-generation-model) | |
| - [๐ฅ ไฝๅๅฑ็คบ](#-ไฝๅๅฑ็คบ) | |
| - [๐ฅ๐ฅ๐ฅ ๆดๆฐ!!](#-ๆดๆฐ) | |
| - [๐งฉ ็คพๅบ่ดก็ฎ](#-็คพๅบ่ดก็ฎ) | |
| - [๐ ๅผๆบ่ฎกๅ](#-ๅผๆบ่ฎกๅ) | |
| - [็ฎๅฝ](#็ฎๅฝ) | |
| - [**ๆ่ฆ**](#ๆ่ฆ) | |
| - [**HunyuanVideo ็ๆถๆ**](#hunyuanvideo-็ๆถๆ) | |
| - [๐ **ไบฎ็น**](#-ไบฎ็น) | |
| - [**็ปไธ็ๅพ่ง้ข็ๆๆถๆ**](#็ปไธ็ๅพ่ง้ข็ๆๆถๆ) | |
| - [**MLLM ๆๆฌ็ผ็ ๅจ**](#mllm-ๆๆฌ็ผ็ ๅจ) | |
| - [**3D VAE**](#3d-vae) | |
| - [**Prompt ๆนๅ**](#prompt-ๆนๅ) | |
| - [๐ ่ฝๅ่ฏไผฐ](#-่ฝๅ่ฏไผฐ) | |
| - [๐ ่ฟ่ก้ ็ฝฎ](#-่ฟ่ก้ ็ฝฎ) | |
| - [๐ ๏ธ ๅฎ่ฃ ๅไพ่ต](#๏ธ-ๅฎ่ฃ ๅไพ่ต) | |
| - [Linux ๅฎ่ฃ ๆๅผ](#linux-ๅฎ่ฃ ๆๅผ) | |
| - [๐งฑ ไธ่ฝฝ้ข่ฎญ็ปๆจกๅ](#-ไธ่ฝฝ้ข่ฎญ็ปๆจกๅ) | |
| - [๐ ๅๅกๆจ็](#-ๅๅกๆจ็) | |
| - [ไฝฟ็จๅฝไปค่ก](#ไฝฟ็จๅฝไปค่ก) | |
| - [่ฟ่กgradioๆๅก](#่ฟ่กgradioๆๅก) | |
| - [ๆดๅค้ ็ฝฎ](#ๆดๅค้ ็ฝฎ) | |
| - [๐ ไฝฟ็จ xDiT ๅฎ็ฐๅคๅกๅนถ่กๆจ็](#-ไฝฟ็จ-xdit-ๅฎ็ฐๅคๅกๅนถ่กๆจ็) | |
| - [ไฝฟ็จๅฝไปค่ก](#ไฝฟ็จๅฝไปค่ก-1) | |
| - [๐ FP8 Inference](#---fp8-inference) | |
| - [Using Command Line](#using-command-line) | |
| - [๐ BibTeX](#-bibtex) | |
| - [่ด่ฐข](#่ด่ฐข) | |
| - [Star ่ถๅฟ](#star-่ถๅฟ) | |
| --- | |
| ## **ๆ่ฆ** | |
| HunyuanVideo ๆฏไธไธชๅ จๆฐ็ๅผๆบ่ง้ข็ๆๅคงๆจกๅ๏ผๅ ทๆไธ้ขๅ ็้ญๆบๆจกๅ็ธๅชฒ็พ็่ณๆดไผ็่ง้ข็ๆ่กจ็ฐใไธบไบ่ฎญ็ป HunyuanVideo๏ผๆไปฌ้็จไบไธไธชๅ จ้ข็ๆกๆถ๏ผ้ๆไบๆฐๆฎๆด็ใๅพๅ-่ง้ข่ๅๆจกๅ่ฎญ็ปๅ้ซๆ็ๅบ็ก่ฎพๆฝไปฅๆฏๆๅคง่งๆจกๆจกๅ่ฎญ็ปๅๆจ็ใๆญคๅค๏ผ้่ฟๆๆ็ๆจกๅๆถๆๅๆฐๆฎ้ๆฉๅฑ็ญ็ฅ๏ผๆไปฌๆๅๅฐ่ฎญ็ปไบไธไธชๆฅๆ่ถ ่ฟ 130 ไบฟๅๆฐ็่ง้ข็ๆๆจกๅ๏ผไฝฟๅ ถๆไธบๆๅคง็ๅผๆบ่ง้ข็ๆๆจกๅไนไธใ | |
| ๆไปฌๅจๆจกๅ็ปๆ็่ฎพ่ฎกไธๅไบๅคง้็ๅฎ้ชไปฅ็กฎไฟๅ ถ่ฝๆฅๆ้ซ่ดจ้็่ง่งๆๆใๅคๆ ท็่ฟๅจใๆๆฌ-่ง้ขๅฏน้ฝๅ็ๆ็จณๅฎๆงใๆ นๆฎไธไธไบบๅ็่ฏไผฐ็ปๆ๏ผHunyuanVideo ๅจ็ปผๅๆๆ ไธไผไบไปฅๅพ็ๆๅ ่ฟๆจกๅ๏ผๅ ๆฌ Runway Gen-3ใLuma 1.6 ๅ 3 ไธชไธญๆ็คพๅบ่กจ็ฐๆๅฅฝ็่ง้ข็ๆๆจกๅใ**้่ฟๅผๆบๅบ็กๆจกๅๅๅบ็จๆจกๅ็ไปฃ็ ๅๆ้๏ผๆไปฌๆจๅจๅผฅๅ้ญๆบๅๅผๆบ่ง้ขๅบ็กๆจกๅไน้ด็ๅทฎ่ท๏ผๅธฎๅฉ็คพๅบไธญ็ๆฏไธชไบบ้ฝ่ฝๅคๅฐ่ฏ่ชๅทฑ็ๆณๆณ๏ผไฟ่ฟๆดๅ ๅจๆๅๆดป่ท็่ง้ข็ๆ็ๆใ** | |
| ## **HunyuanVideo ็ๆถๆ** | |
| HunyuanVideo ๆฏไธไธช้็ฉบ้ดๆจกๅ๏ผ่ฎญ็ปๆถๅฎ้็จไบ 3D VAE ๅ็ผฉๆถ้ด็ปดๅบฆๅ็ฉบ้ด็ปดๅบฆ็็นๅพใๆๆฌๆ็คบ้่ฟไธไธชๅคง่ฏญ่จๆจกๅ็ผ็ ๅไฝไธบๆกไปถ่พๅ ฅๆจกๅ๏ผๅผๅฏผๆจกๅ้่ฟๅฏน้ซๆฏๅชๅฃฐ็ๅคๆญฅๅปๅช๏ผ่พๅบไธไธช่ง้ข็้็ฉบ้ด่กจ็คบใๆๅ๏ผๆจ็ๆถ้่ฟ 3D VAE ่งฃ็ ๅจๅฐ้็ฉบ้ด่กจ็คบ่งฃ็ ไธบ่ง้ขใ | |
| <p align="center"> | |
| <img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo/refs/heads/main/assets/overall.png" height=300> | |
| </p> | |
| ## ๐ **ไบฎ็น** | |
| ### **็ปไธ็ๅพ่ง้ข็ๆๆถๆ** | |
| HunyuanVideo ้็จไบ Transformer ๅ Full Attention ็่ฎพ่ฎก็จไบ่ง้ข็ๆใๅ ทไฝๆฅ่ฏด๏ผๆไปฌไฝฟ็จไบไธไธชโๅๆตๅฐๅๆตโ็ๆททๅๆจกๅ่ฎพ่ฎก็จไบ่ง้ข็ๆใๅจๅๆต้ถๆฎต๏ผ่ง้ขๅๆๆฌ token ้่ฟๅนถ่ก็ Transformer Block ็ฌ็ซๅค็๏ผไฝฟๅพๆฏไธชๆจกๆๅฏไปฅๅญฆไน ้ๅ่ชๅทฑ็่ฐๅถๆบๅถ่ไธไผ็ธไบๅนฒๆฐใๅจๅๆต้ถๆฎต๏ผๆไปฌๅฐ่ง้ขๅๆๆฌ token ่ฟๆฅ่ตทๆฅๅนถๅฐๅฎไปฌ่พๅ ฅๅฐๅ็ปญ็ Transformer Block ไธญ่ฟ่กๆๆ็ๅคๆจกๆไฟกๆฏ่ๅใ่ฟ็ง่ฎพ่ฎกๆๆไบ่ง่งๅ่ฏญไนไฟกๆฏไน้ด็ๅคๆไบคไบ๏ผๅขๅผบไบๆดไฝๆจกๅๆง่ฝใ | |
| <p align="center"> | |
| <img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo/refs/heads/main/assets/backbone.png" height=350> | |
| </p> | |
| ### **MLLM ๆๆฌ็ผ็ ๅจ** | |
| ่ฟๅป็่ง้ข็ๆๆจกๅ้ๅธธไฝฟ็จ้ข่ฎญ็ป็ CLIP ๅ T5-XXL ไฝไธบๆๆฌ็ผ็ ๅจ๏ผๅ ถไธญ CLIP ไฝฟ็จ Transformer Encoder๏ผT5 ไฝฟ็จ Encoder-Decoder ็ปๆใHunyuanVideo ไฝฟ็จไบไธไธช้ข่ฎญ็ป็ Multimodal Large Language Model (MLLM) ไฝไธบๆๆฌ็ผ็ ๅจ๏ผๅฎๅ ทๆไปฅไธไผๅฟ๏ผ | |
| * ไธ T5 ็ธๆฏ๏ผMLLM ๅบไบๅพๆๆฐๆฎๆไปคๅพฎ่ฐๅๅจ็นๅพ็ฉบ้ดไธญๅ ทๆๆดๅฅฝ็ๅพๅ-ๆๆฌๅฏน้ฝ่ฝๅ๏ผ่ฟๅ่ฝปไบๆฉๆฃๆจกๅไธญ็ๅพๆๅฏน้ฝ็้พๅบฆ๏ผ | |
| * ไธ CLIP ็ธๆฏ๏ผMLLM ๅจๅพๅ็็ป่ๆ่ฟฐๅๅคๆๆจ็ๆน้ข่กจ็ฐๅบๆดๅผบ็่ฝๅ๏ผ | |
| * MLLM ๅฏไปฅ้่ฟ้ตๅพช็ณป็ปๆไปคๅฎ็ฐ้ถๆ ทๆฌ็ๆ๏ผๅธฎๅฉๆๆฌ็นๅพๆดๅคๅฐๅ ณๆณจๅ ณ้ฎไฟกๆฏใ | |
| ็ฑไบ MLLM ๆฏๅบไบ Causal Attention ็๏ผ่ T5-XXL ไฝฟ็จไบ Bidirectional Attention ไธบๆฉๆฃๆจกๅๆไพๆดๅฅฝ็ๆๆฌๅผๅฏผใๅ ๆญค๏ผๆไปฌๅผๅ ฅไบไธไธช้ขๅค็ token ไผๅๅจๆฅๅขๅผบๆๆฌ็นๅพใ | |
| <p align="center"> | |
| <img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo/refs/heads/main/assets/text_encoder.png" height=275> | |
| </p> | |
| ### **3D VAE** | |
| ๆไปฌ็ VAE ้็จไบ CausalConv3D ไฝไธบ HunyuanVideo ็็ผ็ ๅจๅ่งฃ็ ๅจ๏ผ็จไบๅ็ผฉ่ง้ข็ๆถ้ด็ปดๅบฆๅ็ฉบ้ด็ปดๅบฆ๏ผๅ ถไธญๆถ้ด็ปดๅบฆๅ็ผฉ 4 ๅ๏ผ็ฉบ้ด็ปดๅบฆๅ็ผฉ 8 ๅ๏ผๅ็ผฉไธบ 16 channelsใ่ฟๆ ทๅฏไปฅๆพ่ๅๅฐๅ็ปญ Transformer ๆจกๅ็ token ๆฐ้๏ผไฝฟๆไปฌ่ฝๅคๅจๅๅงๅ่พจ็ๅๅธง็ไธ่ฎญ็ป่ง้ข็ๆๆจกๅใ | |
| <p align="center"> | |
| <img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo/refs/heads/main/assets/3dvae.png" height=150> | |
| </p> | |
| ### **Prompt ๆนๅ** | |
| ไธบไบ่งฃๅณ็จๆท่พๅ ฅๆๆฌๆ็คบ็ๅคๆ ทๆงๅไธไธ่ดๆง็ๅฐ้พ๏ผๆไปฌๅพฎ่ฐไบ [Hunyuan-Large model](https://github.com/Tencent/Tencent-Hunyuan-Large) ๆจกๅไฝไธบๆไปฌ็ prompt ๆนๅๆจกๅ๏ผๅฐ็จๆท่พๅ ฅ็ๆ็คบ่ฏๆนๅไธบๆด้ๅๆจกๅๅๅฅฝ็ๅๆณใ | |
| ๆไปฌๆไพไบไธคไธชๆนๅๆจกๅผ๏ผๆญฃๅธธๆจกๅผๅๅฏผๆผๆจกๅผใไธค็งๆจกๅผ็ๆ็คบ่ฏ่ง[่ฟ้](hyvideo/prompt_rewrite.py)ใๆญฃๅธธๆจกๅผๆจๅจๅขๅผบ่ง้ข็ๆๆจกๅๅฏน็จๆทๆๅพ็็่งฃ๏ผไป่ๆดๅ็กฎๅฐ่งฃ้ๆไพ็ๆไปคใๅฏผๆผๆจกๅผๅขๅผบไบ่ฏธๅฆๆๅพใๅ ็ งๅๆๅๆบ็งปๅจ็ญๆน้ข็ๆ่ฟฐ๏ผๅพๅไบ็ๆ่ง่ง่ดจ้ๆด้ซ็่ง้ขใๆณจๆ๏ผ่ฟ็งๅขๅผบๆๆถๅฏ่ฝไผๅฏผ่ดไธไบ่ฏญไน็ป่็ไธขๅคฑใ | |
| Prompt ๆนๅๆจกๅๅฏไปฅ็ดๆฅไฝฟ็จ [Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) ้จ็ฝฒๅๆจ็. ๆไปฌๅผๆบไบ prompt ๆนๅๆจกๅ็ๆ้๏ผ่ง[่ฟ้](https://huggingface.co/Tencent/HunyuanVideo-PromptRewrite). | |
| ## ๐ ่ฝๅ่ฏไผฐ | |
| ไธบไบ่ฏไผฐ HunyuanVideo ็่ฝๅ๏ผๆไปฌ้ๆฉไบๅไธช้ญๆบ่ง้ข็ๆๆจกๅไฝไธบๅฏนๆฏใๆไปฌๆปๅ ฑไฝฟ็จไบ 1,533 ไธช prompt๏ผๆฏไธช prompt ้่ฟไธๆฌกๆจ็็ๆไบ็ธๅๆฐ้็่ง้ขๆ ทๆฌใไธบไบๅ ฌๅนณๆฏ่พ๏ผๆไปฌๅช่ฟ่กไบไธๆฌกๆจ็ไปฅ้ฟๅ ไปปไฝๆ้ใๅจไธๅ ถไปๆนๆณๆฏ่พๆถ๏ผๆไปฌไฟๆไบๆๆ้ๆฉๆจกๅ็้ป่ฎค่ฎพ็ฝฎ๏ผๅนถ็กฎไฟไบ่ง้ขๅ่พจ็็ไธ่ดๆงใ่ง้ขๆ นๆฎไธไธชๆ ๅ่ฟ่ก่ฏไผฐ๏ผๆๆฌๅฏน้ฝใ่ฟๅจ่ดจ้ๅ่ง่ง่ดจ้ใๅจ 60 ๅคๅไธไธ่ฏไผฐไบบๅ่ฏไผฐๅ๏ผHunyuanVideo ๅจ็ปผๅๆๆ ไธ่กจ็ฐๆๅฅฝ๏ผ็นๅซๆฏๅจ่ฟๅจ่ดจ้ๆน้ข่กจ็ฐ่พไธบ็ชๅบใ | |
| <p align="center"> | |
| <table> | |
| <thead> | |
| <tr> | |
| <th rowspan="2">ๆจกๅ</th> <th rowspan="2">ๆฏๅฆๅผๆบ</th> <th>ๆถ้ฟ</th> <th>ๆๆฌๅฏน้ฝ</th> <th>่ฟๅจ่ดจ้</th> <th rowspan="2">่ง่ง่ดจ้</th> <th rowspan="2">็ปผๅ่ฏไปท</th> <th rowspan="2">ๆๅบ</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr> | |
| <td>HunyuanVideo (Ours)</td> <td> โ </td> <td>5s</td> <td>61.8%</td> <td>66.5%</td> <td>95.7%</td> <td>41.3%</td> <td>1</td> | |
| </tr> | |
| <tr> | |
| <td>ๅฝๅ ๆจกๅ A (API)</td> <td> ✘ </td> <td>5s</td> <td>62.6%</td> <td>61.7%</td> <td>95.6%</td> <td>37.7%</td> <td>2</td> | |
| </tr> | |
| <tr> | |
| <td>ๅฝๅ ๆจกๅ B (Web)</td> <td> ✘</td> <td>5s</td> <td>60.1%</td> <td>62.9%</td> <td>97.7%</td> <td>37.5%</td> <td>3</td> | |
| </tr> | |
| <tr> | |
| <td>GEN-3 alpha (Web)</td> <td>✘</td> <td>6s</td> <td>47.7%</td> <td>54.7%</td> <td>97.5%</td> <td>27.4%</td> <td>4</td> | |
| </tr> | |
| <tr> | |
| <td>Luma1.6 (API)</td><td>✘</td> <td>5s</td> <td>57.6%</td> <td>44.2%</td> <td>94.1%</td> <td>24.8%</td> <td>5</td> | |
| </tr> | |
| </tbody> | |
| </table> | |
| </p> | |
| ## ๐ ่ฟ่ก้ ็ฝฎ | |
| ไธ่กจๅๅบไบ่ฟ่ก HunyuanVideo ๆจกๅไฝฟ็จๆๆฌ็ๆ่ง้ข็ๆจ่้ ็ฝฎ๏ผbatch size = 1๏ผ๏ผ | |
| | ๆจกๅ | ๅ่พจ็<br/>(height/width/frame) | ๅณฐๅผๆพๅญ | | |
| |:--------------:|:--------------------------------:|:----------------:| | |
| | HunyuanVideo | 720px1280px129f | 60G | | |
| | HunyuanVideo | 544px960px129f | 45G | | |
| * ๆฌ้กน็ฎ้็จไบไฝฟ็จ NVIDIA GPU ๅๆฏๆ CUDA ็่ฎพๅค | |
| * ๆจกๅๅจๅๅผ 80G GPU ไธๆต่ฏ | |
| * ่ฟ่ก 720px1280px129f ็ๆๅฐๆพๅญ่ฆๆฑๆฏ 60GB๏ผ544px960px129f ็ๆๅฐๆพๅญ่ฆๆฑๆฏ 45GBใ | |
| * ๆต่ฏๆไฝ็ณป็ป๏ผLinux | |
| ## ๐ ๏ธ ๅฎ่ฃ ๅไพ่ต | |
| ้ฆๅ ๅ ้ git ไปๅบ: | |
| ```shell | |
| git clone https://github.com/tencent/HunyuanVideo | |
| cd HunyuanVideo | |
| ``` | |
| ### Linux ๅฎ่ฃ ๆๅผ | |
| ๆไปฌๆจ่ไฝฟ็จ CUDA 12.4 ๆ 11.8 ็็ๆฌใ | |
| Conda ็ๅฎ่ฃ ๆๅๅฏไปฅๅ่[่ฟ้](https://docs.anaconda.com/free/miniconda/index.html)ใ | |
| ```shell | |
| # 1. Create conda environment | |
| conda create -n HunyuanVideo python==3.10.9 | |
| # 2. Activate the environment | |
| conda activate HunyuanVideo | |
| # 3. Install PyTorch and other dependencies using conda | |
| # For CUDA 11.8 | |
| conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia | |
| # For CUDA 12.4 | |
| conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia | |
| # 4. Install pip dependencies | |
| python -m pip install -r requirements.txt | |
| # 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above) | |
| python -m pip install ninja | |
| python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3 | |
| # 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3) | |
| python -m pip install xfuser==0.4.0 | |
| ``` | |
| ๅฆๆๅจ็นๅฎ GPU ๅๅทไธ้ญ้ float point exception(core dump) ้ฎ้ข๏ผๅฏๅฐ่ฏไปฅไธๆนๆกไฟฎๅค๏ผ | |
| ```shell | |
| #้้กน1๏ผ็กฎไฟๅทฒๆญฃ็กฎๅฎ่ฃ CUDA 12.4, CUBLAS>=12.4.5.8, ๅ CUDNN>=9.00 (ๆ็ดๆฅไฝฟ็จๆไปฌๆไพ็CUDA12้ๅ) | |
| pip install nvidia-cublas-cu12==12.4.5.8 | |
| export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/ | |
| #้้กน2๏ผๅผบๅถๆพๅผไฝฟ็จ CUDA11.8 ็ผ่ฏ็ Pytorch ็ๆฌไปฅๅๅ ถไปๆๆ่ฝฏไปถๅ | |
| pip uninstall -r requirements.txt # ็กฎไฟๅธ่ฝฝๆๆไพ่ตๅ | |
| pip uninstall -y xfuser | |
| pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118 | |
| pip install -r requirements.txt | |
| pip install ninja | |
| pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3 | |
| pip install xfuser==0.4.0 | |
| ``` | |
| ๅฆๅค๏ผๆไปฌๆไพไบไธไธช้ขๆๅปบ็ Docker ้ๅ๏ผๅฏไปฅไฝฟ็จๅฆไธๅฝไปค่ฟ่กๆๅๅ่ฟ่กใ | |
| ```shell | |
| # ็จไบ CUDA 12.4 (ๅทฒๆดๆฐ้ฟๅ float point exception) | |
| docker pull hunyuanvideo/hunyuanvideo:cuda_12 | |
| docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12 | |
| # ็จไบ CUDA 11.8 | |
| docker pull hunyuanvideo/hunyuanvideo:cuda_11 | |
| docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11 | |
| ``` | |
| ## ๐งฑ ไธ่ฝฝ้ข่ฎญ็ปๆจกๅ | |
| ไธ่ฝฝ้ข่ฎญ็ปๆจกๅๅ่[่ฟ้](ckpts/README.md)ใ | |
| ## ๐ ๅๅกๆจ็ | |
| ๆไปฌๅจไธ่กจไธญๅๅบไบๆฏๆ็้ซๅบฆ/ๅฎฝๅบฆ/ๅธงๆฐ่ฎพ็ฝฎใ | |
| | ๅ่พจ็ | h/w=9:16 | h/w=16:9 | h/w=4:3 | h/w=3:4 | h/w=1:1 | | |
| |:---------------------:|:----------------------------:|:---------------:|:---------------:|:---------------:|:---------------:| | |
| | 540p | 544px960px129f | 960px544px129f | 624px832px129f | 832px624px129f | 720px720px129f | | |
| | 720p (ๆจ่) | 720px1280px129f | 1280px720px129f | 1104px832px129f | 832px1104px129f | 960px960px129f | | |
| ### ไฝฟ็จๅฝไปค่ก | |
| ```bash | |
| cd HunyuanVideo | |
| python3 sample_video.py \ | |
| --video-size 720 1280 \ | |
| --video-length 129 \ | |
| --infer-steps 50 \ | |
| --prompt "A cat walks on the grass, realistic style." \ | |
| --flow-reverse \ | |
| --use-cpu-offload \ | |
| --save-path ./results | |
| ``` | |
| ### ่ฟ่กgradioๆๅก | |
| ```bash | |
| python3 gradio_server.py --flow-reverse | |
| # set SERVER_NAME and SERVER_PORT manually | |
| # SERVER_NAME=0.0.0.0 SERVER_PORT=8081 python3 gradio_server.py --flow-reverse | |
| ``` | |
| ### ๆดๅค้ ็ฝฎ | |
| ไธ้ขๅๅบไบๆดๅคๅ ณ้ฎ้ ็ฝฎ้กน๏ผ | |
| | ๅๆฐ | ้ป่ฎคๅผ | ๆ่ฟฐ | | |
| |:----------------------:|:---------:|:-----------------------------------------:| | |
| | `--prompt` | None | ็จไบ็ๆ่ง้ข็ prompt | | |
| | `--video-size` | 720 1280 | ็ๆ่ง้ข็้ซๅบฆๅๅฎฝๅบฆ | | |
| | `--video-length` | 129 | ็ๆ่ง้ข็ๅธงๆฐ | | |
| | `--infer-steps` | 50 | ็ๆๆถ้ๆ ท็ๆญฅๆฐ | | |
| | `--embedded-cfg-scale` | 6.0 | ๆๆฌ็ๆงๅถๅผบๅบฆ | | |
| | `--flow-shift` | 7.0 | ๆจ็ๆถ timestep ็ shift ็ณปๆฐ๏ผๅผ่ถๅคง๏ผ้ซๅชๅบๅ้ๆ ทๆญฅๆฐ่ถๅค | | |
| | `--flow-reverse` | False | If reverse, learning/sampling from t=1 -> t=0 | | |
| | `--neg-prompt` | None | ่ดๅ่ฏ | | |
| | `--seed` | 0 | ้ๆบ็งๅญ | | |
| | `--use-cpu-offload` | False | ๅฏ็จ CPU offload๏ผๅฏไปฅ่็ๆพๅญ | | |
| | `--save-path` | ./results | ไฟๅญ่ทฏๅพ | | |
| ## ๐ ไฝฟ็จ xDiT ๅฎ็ฐๅคๅกๅนถ่กๆจ็ | |
| [xDiT](https://github.com/xdit-project/xDiT) ๆฏไธไธช้ๅฏนๅค GPU ้็พค็ๆฉๅฑๆจ็ๅผๆ๏ผ็จไบๆฉๅฑ Transformers๏ผDiTs๏ผใ | |
| ๅฎๆๅไธบๅ็ง DiT ๆจกๅ๏ผๅ ๆฌ mochi-1ใCogVideoXใFlux.1ใSD3 ็ญ๏ผๆไพไบไฝๅปถ่ฟ็ๅนถ่กๆจ็่งฃๅณๆนๆกใ่ฏฅๅญๅจๅบ้็จไบ [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) API ็จไบๆททๅ ่ง้ขๆจกๅ็ๅนถ่กๆจ็ใ | |
| ### ไฝฟ็จๅฝไปค่ก | |
| ไพๅฆ๏ผๅฏ็จๅฆไธๅฝไปคไฝฟ็จ8ๅผ GPUๅกๅฎๆๆจ็ | |
| ```bash | |
| cd HunyuanVideo | |
| torchrun --nproc_per_node=8 sample_video_parallel.py \ | |
| --video-size 1280 720 \ | |
| --video-length 129 \ | |
| --infer-steps 50 \ | |
| --prompt "A cat walks on the grass, realistic style." \ | |
| --flow-reverse \ | |
| --seed 42 \ | |
| --ulysses_degree 8 \ | |
| --ring_degree 1 \ | |
| --save-path ./results | |
| ``` | |
| ๅฏไปฅ้ ็ฝฎ`--ulysses-degree`ๅ`--ring-degree`ๆฅๆงๅถๅนถ่ก้ ็ฝฎ๏ผๅฏ้ๅๆฐๅฆไธใ | |
| <details> | |
| <summary>ๆฏๆ็ๅนถ่ก้ ็ฝฎ (็นๅปๆฅ็่ฏฆๆ )</summary> | |
| | --video-size | --video-length | --ulysses-degree x --ring-degree | --nproc_per_node | | |
| |----------------------|----------------|----------------------------------|------------------| | |
| | 1280 720 ๆ 720 1280 | 129 | 8x1,4x2,2x4,1x8 | 8 | | |
| | 1280 720 ๆ 720 1280 | 129 | 1x5 | 5 | | |
| | 1280 720 ๆ 720 1280 | 129 | 4x1,2x2,1x4 | 4 | | |
| | 1280 720 ๆ 720 1280 | 129 | 3x1,1x3 | 3 | | |
| | 1280 720 ๆ 720 1280 | 129 | 2x1,1x2 | 2 | | |
| | 1104 832 ๆ 832 1104 | 129 | 4x1,2x2,1x4 | 4 | | |
| | 1104 832 ๆ 832 1104 | 129 | 3x1,1x3 | 3 | | |
| | 1104 832 ๆ 832 1104 | 129 | 2x1,1x2 | 2 | | |
| | 960 960 | 129 | 6x1,3x2,2x3,1x6 | 6 | | |
| | 960 960 | 129 | 4x1,2x2,1x4 | 4 | | |
| | 960 960 | 129 | 3x1,1x3 | 3 | | |
| | 960 960 | 129 | 1x2,2x1 | 2 | | |
| | 960 544 ๆ 544 960 | 129 | 6x1,3x2,2x3,1x6 | 6 | | |
| | 960 544 ๆ 544 960 | 129 | 4x1,2x2,1x4 | 4 | | |
| | 960 544 ๆ 544 960 | 129 | 3x1,1x3 | 3 | | |
| | 960 544 ๆ 544 960 | 129 | 1x2,2x1 | 2 | | |
| | 832 624 ๆ 624 832 | 129 | 4x1,2x2,1x4 | 4 | | |
| | 624 832 ๆ 624 832 | 129 | 3x1,1x3 | 3 | | |
| | 832 624 ๆ 624 832 | 129 | 2x1,1x2 | 2 | | |
| | 720 720 | 129 | 1x5 | 5 | | |
| | 720 720 | 129 | 3x1,1x3 | 3 | | |
| </details> | |
| <p align="center"> | |
| <table align="center"> | |
| <thead> | |
| <tr> | |
| <th colspan="4">ๅจ 8xGPUไธ็ๆ1280x720 (129 ๅธง 50 ๆญฅ)็ๆถ่ (็ง) </th> | |
| </tr> | |
| <tr> | |
| <th>1</th> | |
| <th>2</th> | |
| <th>4</th> | |
| <th>8</th> | |
| </tr> | |
| </thead> | |
| <tbody> | |
| <tr> | |
| <th>1904.08</th> | |
| <th>934.09 (2.04x)</th> | |
| <th>514.08 (3.70x)</th> | |
| <th>337.58 (5.64x)</th> | |
| </tr> | |
| </tbody> | |
| </table> | |
| </p> | |
| ## ๐ FP8 Inference | |
| ไฝฟ็จFP8้ๅๅ็HunyuanVideoๆจกๅ่ฝๅคๅธฎๆจ่็ๅคงๆฆ10GBๆพๅญใ ไฝฟ็จๅ้่ฆไป Huggingface ไธ่ฝฝ[FP8ๆ้](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt)ๅๆฏๅฑ้ๅๆ้็[scaleๅๆฐ](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8_map.pt). | |
| ### Using Command Line | |
| ่ฟ้๏ผๆจๅฟ ้กปๆพ็คบๅฐๆๅฎFP8็ๆ้่ทฏๅพใไพๅฆ๏ผๅฏ็จๅฆไธๅฝไปคไฝฟ็จFP8ๆจกๅๆจ็ | |
| ```bash | |
| cd HunyuanVideo | |
| DIT_CKPT_PATH={PATH_TO_FP8_WEIGHTS}/{WEIGHT_NAME}_fp8.pt | |
| python3 sample_video.py \ | |
| --dit-weight ${DIT_CKPT_PATH} \ | |
| --video-size 1280 720 \ | |
| --video-length 129 \ | |
| --infer-steps 50 \ | |
| --prompt "A cat walks on the grass, realistic style." \ | |
| --seed 42 \ | |
| --embedded-cfg-scale 6.0 \ | |
| --flow-shift 7.0 \ | |
| --flow-reverse \ | |
| --use-cpu-offload \ | |
| --use-fp8 \ | |
| --save-path ./results | |
| ``` | |
| ## ๐ BibTeX | |
| ๅฆๆๆจ่ฎคไธบ [HunyuanVideo](https://arxiv.org/abs/2412.03603) ็ปๆจ็็ ็ฉถๅๅบ็จๅธฆๆฅไบไธไบๅธฎๅฉ๏ผๅฏไปฅ้่ฟไธ้ข็ๆนๅผๆฅๅผ็จ: | |
| ```BibTeX | |
| @misc{kong2024hunyuanvideo, | |
| title={HunyuanVideo: A Systematic Framework For Large Video Generative Models}, | |
| author={Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, Aladdin Wang, Andong Wang, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Hongmei Wang, Jacob Song, Jiawang Bai, Jianbing Wu, Jinbao Xue, Joey Wang, Junkun Yuan, Kai Wang, Mengyang Liu, Pengyu Li, Shuai Li, Weiyan Wang, Wenqing Yu, Xinchi Deng, Yang Li, Yanxin Long, Yi Chen, Yutao Cui, Yuanbo Peng, Zhentao Yu, Zhiyu He, Zhiyong Xu, Zixiang Zhou, Zunnan Xu, Yangyu Tao, Qinglin Lu, Songtao Liu, Dax Zhou, Hongfa Wang, Yong Yang, Di Wang, Yuhong Liu, and Jie Jiang, along with Caesar Zhong}, | |
| year={2024}, | |
| archivePrefix={arXiv preprint arXiv:2412.03603}, | |
| primaryClass={cs.CV} | |
| } | |
| ``` | |
| ## ่ด่ฐข | |
| HunyuanVideo ็ๅผๆบ็ฆปไธๅผ่ฏธๅคๅผๆบๅทฅไฝ๏ผ่ฟ้ๆไปฌ็นๅซๆ่ฐข [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [FLUX](https://github.com/black-forest-labs/flux), [Llama](https://github.com/meta-llama/llama), [LLaVA](https://github.com/haotian-liu/LLaVA), [Xtuner](https://github.com/InternLM/xtuner), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) ็ๅผๆบๅทฅไฝๅๆข็ดขใๅฆๅค๏ผๆไปฌไนๆ่ฐข่ พ่ฎฏๆททๅ ๅคๆจกๆๅข้ๅฏน HunyuanVideo ้้ ๅค็งๆๆฌ็ผ็ ๅจ็ๆฏๆใ | |
| ## Star ่ถๅฟ | |
| <a href="https://star-history.com/#Tencent/HunyuanVideo&Date"> | |
| <picture> | |
| <source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent/HunyuanVideo&type=Date&theme=dark" /> | |
| <source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent/HunyuanVideo&type=Date" /> | |
| <img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent/HunyuanVideo&type=Date" /> | |
| </picture> | |
| </a> | |