HunyuanVideo / README_zh.md
Fabrice-TIERCELIN's picture
Upload 7 files
d3091ae verified
<!-- ## **HunyuanVideo** -->
[English](./README.md)
<p align="center">
<img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo/refs/heads/main/assets/logo.png" height=100>
</p>
# HunyuanVideo: A Systematic Framework For Large Video Generation Model
<div align="center">
<a href="https://github.com/Tencent/HunyuanVideo"><img src="https://img.shields.io/static/v1?label=HunyuanVideo Code&message=Github&color=blue"></a> &ensp;
<a href="https://aivideo.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Web&color=green"></a> &ensp;
<a href="https://video.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Playground&message=Web&color=green"></a>
</div>
<div align="center">
<a href="https://arxiv.org/abs/2412.03603"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv&color=red"></a> &ensp;
<a href="https://aivideo.hunyuan.tencent.com/hunyuanvideo.pdf"><img src="https://img.shields.io/static/v1?label=Tech Report&message=High-Quality Version (~350M)&color=red"></a>
</div>
<div align="center">
<a href="https://huggingface.co/tencent/HunyuanVideo"><img src="https://img.shields.io/static/v1?label=HunyuanVideo&message=HuggingFace&color=yellow"></a> &ensp;
<a href="https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video"><img src="https://img.shields.io/static/v1?label=HunyuanVideo&message=Diffusers&color=yellow"></a> &ensp;
<a href="https://huggingface.co/tencent/HunyuanVideo-PromptRewrite"><img src="https://img.shields.io/static/v1?label=HunyuanVideo-PromptRewrite&message=HuggingFace&color=yellow"></a>
[![Replicate](https://replicate.com/zsxkib/hunyuan-video/badge)](https://replicate.com/zsxkib/hunyuan-video)
</div>
<p align="center">
๐Ÿ‘‹ ๅŠ ๅ…ฅๆˆ‘ไปฌ็š„ <a href="assets/WECHAT.md" target="_blank">WeChat</a> ๅ’Œ <a href="https://discord.gg/GpARqvrh" target="_blank">Discord</a>
</p>
-----
ๆœฌไป“ๅบ“ๅŒ…ๅซไบ† HunyuanVideo ้กน็›ฎ็š„ PyTorch ๆจกๅž‹ๅฎšไน‰ใ€้ข„่ฎญ็ปƒๆƒ้‡ๅ’ŒๆŽจ็†/้‡‡ๆ ทไปฃ็ ใ€‚ๅ‚่€ƒๆˆ‘ไปฌ็š„้กน็›ฎ้กต้ข [project page](https://aivideo.hunyuan.tencent.com) ๆŸฅ็œ‹ๆ›ดๅคšๅ†…ๅฎนใ€‚
> [**HunyuanVideo: A Systematic Framework For Large Video Generation Model**](https://arxiv.org/abs/2412.03603) <br>
## ๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ๆ›ดๆ–ฐ!!
* 2024ๅนด12ๆœˆ18ๆ—ฅ: ๐Ÿƒโ€โ™‚๏ธ ๅผ€ๆบ HunyuanVideo [FP8 ๆจกๅž‹ๆƒ้‡](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt)๏ผŒ่Š‚็œๆ›ดๅคš GPU ๆ˜พๅญ˜ใ€‚
* 2024ๅนด12ๆœˆ17ๆ—ฅ: ๐Ÿค— HunyuanVideoๅทฒ็ป้›†ๆˆๅˆฐ[Diffusers](https://huggingface.co/docs/diffusers/main/api/pipelines/hunyuan_video)ไธญใ€‚
* 2024ๅนด12ๆœˆ03ๆ—ฅ: ๐Ÿš€ ๅผ€ๆบ HunyuanVideo ๅคšๅกๅนถ่กŒๆŽจ็†ไปฃ็ ๏ผŒ็”ฑ[xDiT](https://github.com/xdit-project/xDiT)ๆไพ›ใ€‚
* 2024ๅนด12ๆœˆ03ๆ—ฅ: ๐Ÿ‘‹ ๅผ€ๆบ HunyuanVideo ๆ–‡็”Ÿ่ง†้ข‘็š„ๆŽจ็†ไปฃ็ ๅ’Œๆจกๅž‹ๆƒ้‡ใ€‚
## ๐ŸŽฅ ไฝœๅ“ๅฑ•็คบ
<div align="center">
<video width="70%" src="https://github.com/user-attachments/assets/22440764-0d7e-438e-a44d-d0dad1006d3d" poster="./assets/video_poster.png"> </video>
</div>
## ๐Ÿงฉ ็คพๅŒบ่ดก็Œฎ
ๅฆ‚ๆžœๆ‚จ็š„้กน็›ฎไธญๆœ‰ๅผ€ๅ‘ๆˆ–ไฝฟ็”จ HunyuanVideo๏ผŒๆฌข่ฟŽๅ‘Š็Ÿฅๆˆ‘ไปฌใ€‚
- ComfyUI (ๆ”ฏๆŒFP8ๆŽจ็†ใ€V2Vๅ’ŒIP2V็”Ÿๆˆ): [ComfyUI-HunyuanVideoWrapper](https://github.com/kijai/ComfyUI-HunyuanVideoWrapper) by [Kijai](https://github.com/kijai)
- FastVideo (ไธ€่‡ดๆ€ง่’ธ้ฆๆจกๅž‹): [FastVideo](https://github.com/hao-ai-lab/FastVideo) by [Hao AI Lab](https://hao-ai-lab.github.io/)
- HunyuanVideo-gguf (GGUFใ€้‡ๅŒ–): [HunyuanVideo-gguf](https://huggingface.co/city96/HunyuanVideo-gguf) by [city96](https://huggingface.co/city96)
- Enhance-A-Video (็”Ÿๆˆๆ›ด้ซ˜่ดจ้‡็š„่ง†้ข‘): [Enhance-A-Video](https://github.com/NUS-HPC-AI-Lab/Enhance-A-Video) by [NUS-HPC-AI-Lab](https://ai.comp.nus.edu.sg/)
- TeaCache (ๅŸบไบŽ็ผ“ๅญ˜็š„ๅŠ ้€Ÿ้‡‡ๆ ท): [TeaCache](https://github.com/LiewFeng/TeaCache) by [Feng Liu](https://github.com/LiewFeng)
## ๐Ÿ“‘ ๅผ€ๆบ่ฎกๅˆ’
- HunyuanVideo (ๆ–‡็”Ÿ่ง†้ข‘ๆจกๅž‹)
- [x] ๆŽจ็†ไปฃ็ 
- [x] ๆจกๅž‹ๆƒ้‡
- [x] ๅคšGPUๅบๅˆ—ๅนถ่กŒๆŽจ็†๏ผˆGPU ่ถŠๅคš๏ผŒๆŽจ็†้€Ÿๅบฆ่ถŠๅฟซ๏ผ‰
- [x] Web Demo (Gradio)
- [x] Diffusers
- [x] FP8 ้‡ๅŒ–็‰ˆๆœฌ
- [ ] Penguin Video ๅŸบๅ‡†ๆต‹่ฏ•้›†
- [ ] ComfyUI
- [ ] ๅคšGPU PipeFusionๅนถ่กŒๆŽจ็† (ๆ›ดไฝŽๆ˜พๅญ˜้œ€ๆฑ‚)
- HunyuanVideo (ๅ›พ็”Ÿ่ง†้ข‘ๆจกๅž‹)
- [ ] ๆŽจ็†ไปฃ็ 
- [ ] ๆจกๅž‹ๆƒ้‡
## ็›ฎๅฝ•
- [HunyuanVideo: A Systematic Framework For Large Video Generation Model](#hunyuanvideo-a-systematic-framework-for-large-video-generation-model)
- [๐ŸŽฅ ไฝœๅ“ๅฑ•็คบ](#-ไฝœๅ“ๅฑ•็คบ)
- [๐Ÿ”ฅ๐Ÿ”ฅ๐Ÿ”ฅ ๆ›ดๆ–ฐ!!](#-ๆ›ดๆ–ฐ)
- [๐Ÿงฉ ็คพๅŒบ่ดก็Œฎ](#-็คพๅŒบ่ดก็Œฎ)
- [๐Ÿ“‘ ๅผ€ๆบ่ฎกๅˆ’](#-ๅผ€ๆบ่ฎกๅˆ’)
- [็›ฎๅฝ•](#็›ฎๅฝ•)
- [**ๆ‘˜่ฆ**](#ๆ‘˜่ฆ)
- [**HunyuanVideo ็š„ๆžถๆž„**](#hunyuanvideo-็š„ๆžถๆž„)
- [๐ŸŽ‰ **ไบฎ็‚น**](#-ไบฎ็‚น)
- [**็ปŸไธ€็š„ๅ›พ่ง†้ข‘็”Ÿๆˆๆžถๆž„**](#็ปŸไธ€็š„ๅ›พ่ง†้ข‘็”Ÿๆˆๆžถๆž„)
- [**MLLM ๆ–‡ๆœฌ็ผ–็ ๅ™จ**](#mllm-ๆ–‡ๆœฌ็ผ–็ ๅ™จ)
- [**3D VAE**](#3d-vae)
- [**Prompt ๆ”นๅ†™**](#prompt-ๆ”นๅ†™)
- [๐Ÿ“ˆ ่ƒฝๅŠ›่ฏ„ไผฐ](#-่ƒฝๅŠ›่ฏ„ไผฐ)
- [๐Ÿ“œ ่ฟ่กŒ้…็ฝฎ](#-่ฟ่กŒ้…็ฝฎ)
- [๐Ÿ› ๏ธ ๅฎ‰่ฃ…ๅ’Œไพ่ต–](#๏ธ-ๅฎ‰่ฃ…ๅ’Œไพ่ต–)
- [Linux ๅฎ‰่ฃ…ๆŒ‡ๅผ•](#linux-ๅฎ‰่ฃ…ๆŒ‡ๅผ•)
- [๐Ÿงฑ ไธ‹่ฝฝ้ข„่ฎญ็ปƒๆจกๅž‹](#-ไธ‹่ฝฝ้ข„่ฎญ็ปƒๆจกๅž‹)
- [๐Ÿ”‘ ๅ•ๅกๆŽจ็†](#-ๅ•ๅกๆŽจ็†)
- [ไฝฟ็”จๅ‘ฝไปค่กŒ](#ไฝฟ็”จๅ‘ฝไปค่กŒ)
- [่ฟ่กŒgradioๆœๅŠก](#่ฟ่กŒgradioๆœๅŠก)
- [ๆ›ดๅคš้…็ฝฎ](#ๆ›ดๅคš้…็ฝฎ)
- [๐Ÿš€ ไฝฟ็”จ xDiT ๅฎž็Žฐๅคšๅกๅนถ่กŒๆŽจ็†](#-ไฝฟ็”จ-xdit-ๅฎž็Žฐๅคšๅกๅนถ่กŒๆŽจ็†)
- [ไฝฟ็”จๅ‘ฝไปค่กŒ](#ไฝฟ็”จๅ‘ฝไปค่กŒ-1)
- [๐Ÿš€ FP8 Inference](#---fp8-inference)
- [Using Command Line](#using-command-line)
- [๐Ÿ”— BibTeX](#-bibtex)
- [่‡ด่ฐข](#่‡ด่ฐข)
- [Star ่ถ‹ๅŠฟ](#star-่ถ‹ๅŠฟ)
---
## **ๆ‘˜่ฆ**
HunyuanVideo ๆ˜ฏไธ€ไธชๅ…จๆ–ฐ็š„ๅผ€ๆบ่ง†้ข‘็”Ÿๆˆๅคงๆจกๅž‹๏ผŒๅ…ทๆœ‰ไธŽ้ข†ๅ…ˆ็š„้—ญๆบๆจกๅž‹็›ธๅชฒ็พŽ็”š่‡ณๆ›ดไผ˜็š„่ง†้ข‘็”Ÿๆˆ่กจ็Žฐใ€‚ไธบไบ†่ฎญ็ปƒ HunyuanVideo๏ผŒๆˆ‘ไปฌ้‡‡็”จไบ†ไธ€ไธชๅ…จ้ข็š„ๆก†ๆžถ๏ผŒ้›†ๆˆไบ†ๆ•ฐๆฎๆ•ด็†ใ€ๅ›พๅƒ-่ง†้ข‘่”ๅˆๆจกๅž‹่ฎญ็ปƒๅ’Œ้ซ˜ๆ•ˆ็š„ๅŸบ็ก€่ฎพๆ–ฝไปฅๆ”ฏๆŒๅคง่ง„ๆจกๆจกๅž‹่ฎญ็ปƒๅ’ŒๆŽจ็†ใ€‚ๆญคๅค–๏ผŒ้€š่ฟ‡ๆœ‰ๆ•ˆ็š„ๆจกๅž‹ๆžถๆž„ๅ’Œๆ•ฐๆฎ้›†ๆ‰ฉๅฑ•็ญ–็•ฅ๏ผŒๆˆ‘ไปฌๆˆๅŠŸๅœฐ่ฎญ็ปƒไบ†ไธ€ไธชๆ‹ฅๆœ‰่ถ…่ฟ‡ 130 ไบฟๅ‚ๆ•ฐ็š„่ง†้ข‘็”Ÿๆˆๆจกๅž‹๏ผŒไฝฟๅ…ถๆˆไธบๆœ€ๅคง็š„ๅผ€ๆบ่ง†้ข‘็”Ÿๆˆๆจกๅž‹ไน‹ไธ€ใ€‚
ๆˆ‘ไปฌๅœจๆจกๅž‹็ป“ๆž„็š„่ฎพ่ฎกไธŠๅšไบ†ๅคง้‡็š„ๅฎž้ชŒไปฅ็กฎไฟๅ…ถ่ƒฝๆ‹ฅๆœ‰้ซ˜่ดจ้‡็š„่ง†่ง‰ๆ•ˆๆžœใ€ๅคšๆ ท็š„่ฟๅŠจใ€ๆ–‡ๆœฌ-่ง†้ข‘ๅฏน้ฝๅ’Œ็”Ÿๆˆ็จณๅฎšๆ€งใ€‚ๆ นๆฎไธ“ไธšไบบๅ‘˜็š„่ฏ„ไผฐ็ป“ๆžœ๏ผŒHunyuanVideo ๅœจ็ปผๅˆๆŒ‡ๆ ‡ไธŠไผ˜ไบŽไปฅๅพ€็š„ๆœ€ๅ…ˆ่ฟ›ๆจกๅž‹๏ผŒๅŒ…ๆ‹ฌ Runway Gen-3ใ€Luma 1.6 ๅ’Œ 3 ไธชไธญๆ–‡็คพๅŒบ่กจ็Žฐๆœ€ๅฅฝ็š„่ง†้ข‘็”Ÿๆˆๆจกๅž‹ใ€‚**้€š่ฟ‡ๅผ€ๆบๅŸบ็ก€ๆจกๅž‹ๅ’Œๅบ”็”จๆจกๅž‹็š„ไปฃ็ ๅ’Œๆƒ้‡๏ผŒๆˆ‘ไปฌๆ—จๅœจๅผฅๅˆ้—ญๆบๅ’Œๅผ€ๆบ่ง†้ข‘ๅŸบ็ก€ๆจกๅž‹ไน‹้—ด็š„ๅทฎ่ท๏ผŒๅธฎๅŠฉ็คพๅŒบไธญ็š„ๆฏไธชไบบ้ƒฝ่ƒฝๅคŸๅฐ่ฏ•่‡ชๅทฑ็š„ๆƒณๆณ•๏ผŒไฟƒ่ฟ›ๆ›ดๅŠ ๅŠจๆ€ๅ’Œๆดป่ทƒ็š„่ง†้ข‘็”Ÿๆˆ็”Ÿๆ€ใ€‚**
## **HunyuanVideo ็š„ๆžถๆž„**
HunyuanVideo ๆ˜ฏไธ€ไธช้š็ฉบ้—ดๆจกๅž‹๏ผŒ่ฎญ็ปƒๆ—ถๅฎƒ้‡‡็”จไบ† 3D VAE ๅŽ‹็ผฉๆ—ถ้—ด็ปดๅบฆๅ’Œ็ฉบ้—ด็ปดๅบฆ็š„็‰นๅพใ€‚ๆ–‡ๆœฌๆ็คบ้€š่ฟ‡ไธ€ไธชๅคง่ฏญ่จ€ๆจกๅž‹็ผ–็ ๅŽไฝœไธบๆกไปถ่พ“ๅ…ฅๆจกๅž‹๏ผŒๅผ•ๅฏผๆจกๅž‹้€š่ฟ‡ๅฏน้ซ˜ๆ–ฏๅ™ชๅฃฐ็š„ๅคšๆญฅๅŽปๅ™ช๏ผŒ่พ“ๅ‡บไธ€ไธช่ง†้ข‘็š„้š็ฉบ้—ด่กจ็คบใ€‚ๆœ€ๅŽ๏ผŒๆŽจ็†ๆ—ถ้€š่ฟ‡ 3D VAE ่งฃ็ ๅ™จๅฐ†้š็ฉบ้—ด่กจ็คบ่งฃ็ ไธบ่ง†้ข‘ใ€‚
<p align="center">
<img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo/refs/heads/main/assets/overall.png" height=300>
</p>
## ๐ŸŽ‰ **ไบฎ็‚น**
### **็ปŸไธ€็š„ๅ›พ่ง†้ข‘็”Ÿๆˆๆžถๆž„**
HunyuanVideo ้‡‡็”จไบ† Transformer ๅ’Œ Full Attention ็š„่ฎพ่ฎก็”จไบŽ่ง†้ข‘็”Ÿๆˆใ€‚ๅ…ทไฝ“ๆฅ่ฏด๏ผŒๆˆ‘ไปฌไฝฟ็”จไบ†ไธ€ไธชโ€œๅŒๆตๅˆฐๅ•ๆตโ€็š„ๆททๅˆๆจกๅž‹่ฎพ่ฎก็”จไบŽ่ง†้ข‘็”Ÿๆˆใ€‚ๅœจๅŒๆต้˜ถๆฎต๏ผŒ่ง†้ข‘ๅ’Œๆ–‡ๆœฌ token ้€š่ฟ‡ๅนถ่กŒ็š„ Transformer Block ็‹ฌ็ซ‹ๅค„็†๏ผŒไฝฟๅพ—ๆฏไธชๆจกๆ€ๅฏไปฅๅญฆไน ้€‚ๅˆ่‡ชๅทฑ็š„่ฐƒๅˆถๆœบๅˆถ่€Œไธไผš็›ธไบ’ๅนฒๆ‰ฐใ€‚ๅœจๅ•ๆต้˜ถๆฎต๏ผŒๆˆ‘ไปฌๅฐ†่ง†้ข‘ๅ’Œๆ–‡ๆœฌ token ่ฟžๆŽฅ่ตทๆฅๅนถๅฐ†ๅฎƒไปฌ่พ“ๅ…ฅๅˆฐๅŽ็ปญ็š„ Transformer Block ไธญ่ฟ›่กŒๆœ‰ๆ•ˆ็š„ๅคšๆจกๆ€ไฟกๆฏ่žๅˆใ€‚่ฟ™็ง่ฎพ่ฎกๆ•ๆ‰ไบ†่ง†่ง‰ๅ’Œ่ฏญไน‰ไฟกๆฏไน‹้—ด็š„ๅคๆ‚ไบคไบ’๏ผŒๅขžๅผบไบ†ๆ•ดไฝ“ๆจกๅž‹ๆ€ง่ƒฝใ€‚
<p align="center">
<img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo/refs/heads/main/assets/backbone.png" height=350>
</p>
### **MLLM ๆ–‡ๆœฌ็ผ–็ ๅ™จ**
่ฟ‡ๅŽป็š„่ง†้ข‘็”Ÿๆˆๆจกๅž‹้€šๅธธไฝฟ็”จ้ข„่ฎญ็ปƒ็š„ CLIP ๅ’Œ T5-XXL ไฝœไธบๆ–‡ๆœฌ็ผ–็ ๅ™จ๏ผŒๅ…ถไธญ CLIP ไฝฟ็”จ Transformer Encoder๏ผŒT5 ไฝฟ็”จ Encoder-Decoder ็ป“ๆž„ใ€‚HunyuanVideo ไฝฟ็”จไบ†ไธ€ไธช้ข„่ฎญ็ปƒ็š„ Multimodal Large Language Model (MLLM) ไฝœไธบๆ–‡ๆœฌ็ผ–็ ๅ™จ๏ผŒๅฎƒๅ…ทๆœ‰ไปฅไธ‹ไผ˜ๅŠฟ๏ผš
* ไธŽ T5 ็›ธๆฏ”๏ผŒMLLM ๅŸบไบŽๅ›พๆ–‡ๆ•ฐๆฎๆŒ‡ไปคๅพฎ่ฐƒๅŽๅœจ็‰นๅพ็ฉบ้—ดไธญๅ…ทๆœ‰ๆ›ดๅฅฝ็š„ๅ›พๅƒ-ๆ–‡ๆœฌๅฏน้ฝ่ƒฝๅŠ›๏ผŒ่ฟ™ๅ‡่ฝปไบ†ๆ‰ฉๆ•ฃๆจกๅž‹ไธญ็š„ๅ›พๆ–‡ๅฏน้ฝ็š„้šพๅบฆ๏ผ›
* ไธŽ CLIP ็›ธๆฏ”๏ผŒMLLM ๅœจๅ›พๅƒ็š„็ป†่Š‚ๆ่ฟฐๅ’Œๅคๆ‚ๆŽจ็†ๆ–น้ข่กจ็Žฐๅ‡บๆ›ดๅผบ็š„่ƒฝๅŠ›๏ผ›
* MLLM ๅฏไปฅ้€š่ฟ‡้ตๅพช็ณป็ปŸๆŒ‡ไปคๅฎž็Žฐ้›ถๆ ทๆœฌ็”Ÿๆˆ๏ผŒๅธฎๅŠฉๆ–‡ๆœฌ็‰นๅพๆ›ดๅคšๅœฐๅ…ณๆณจๅ…ณ้”ฎไฟกๆฏใ€‚
็”ฑไบŽ MLLM ๆ˜ฏๅŸบไบŽ Causal Attention ็š„๏ผŒ่€Œ T5-XXL ไฝฟ็”จไบ† Bidirectional Attention ไธบๆ‰ฉๆ•ฃๆจกๅž‹ๆไพ›ๆ›ดๅฅฝ็š„ๆ–‡ๆœฌๅผ•ๅฏผใ€‚ๅ› ๆญค๏ผŒๆˆ‘ไปฌๅผ•ๅ…ฅไบ†ไธ€ไธช้ขๅค–็š„ token ไผ˜ๅŒ–ๅ™จๆฅๅขžๅผบๆ–‡ๆœฌ็‰นๅพใ€‚
<p align="center">
<img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo/refs/heads/main/assets/text_encoder.png" height=275>
</p>
### **3D VAE**
ๆˆ‘ไปฌ็š„ VAE ้‡‡็”จไบ† CausalConv3D ไฝœไธบ HunyuanVideo ็š„็ผ–็ ๅ™จๅ’Œ่งฃ็ ๅ™จ๏ผŒ็”จไบŽๅŽ‹็ผฉ่ง†้ข‘็š„ๆ—ถ้—ด็ปดๅบฆๅ’Œ็ฉบ้—ด็ปดๅบฆ๏ผŒๅ…ถไธญๆ—ถ้—ด็ปดๅบฆๅŽ‹็ผฉ 4 ๅ€๏ผŒ็ฉบ้—ด็ปดๅบฆๅŽ‹็ผฉ 8 ๅ€๏ผŒๅŽ‹็ผฉไธบ 16 channelsใ€‚่ฟ™ๆ ทๅฏไปฅๆ˜พ่‘—ๅ‡ๅฐ‘ๅŽ็ปญ Transformer ๆจกๅž‹็š„ token ๆ•ฐ้‡๏ผŒไฝฟๆˆ‘ไปฌ่ƒฝๅคŸๅœจๅŽŸๅง‹ๅˆ†่พจ็އๅ’Œๅธง็އไธ‹่ฎญ็ปƒ่ง†้ข‘็”Ÿๆˆๆจกๅž‹ใ€‚
<p align="center">
<img src="https://raw.githubusercontent.com/Tencent/HunyuanVideo/refs/heads/main/assets/3dvae.png" height=150>
</p>
### **Prompt ๆ”นๅ†™**
ไธบไบ†่งฃๅ†ณ็”จๆˆท่พ“ๅ…ฅๆ–‡ๆœฌๆ็คบ็š„ๅคšๆ ทๆ€งๅ’Œไธไธ€่‡ดๆ€ง็š„ๅ›ฐ้šพ๏ผŒๆˆ‘ไปฌๅพฎ่ฐƒไบ† [Hunyuan-Large model](https://github.com/Tencent/Tencent-Hunyuan-Large) ๆจกๅž‹ไฝœไธบๆˆ‘ไปฌ็š„ prompt ๆ”นๅ†™ๆจกๅž‹๏ผŒๅฐ†็”จๆˆท่พ“ๅ…ฅ็š„ๆ็คบ่ฏๆ”นๅ†™ไธบๆ›ด้€‚ๅˆๆจกๅž‹ๅๅฅฝ็š„ๅ†™ๆณ•ใ€‚
ๆˆ‘ไปฌๆไพ›ไบ†ไธคไธชๆ”นๅ†™ๆจกๅผ๏ผšๆญฃๅธธๆจกๅผๅ’Œๅฏผๆผ”ๆจกๅผใ€‚ไธค็งๆจกๅผ็š„ๆ็คบ่ฏ่ง[่ฟ™้‡Œ](hyvideo/prompt_rewrite.py)ใ€‚ๆญฃๅธธๆจกๅผๆ—จๅœจๅขžๅผบ่ง†้ข‘็”Ÿๆˆๆจกๅž‹ๅฏน็”จๆˆทๆ„ๅ›พ็š„็†่งฃ๏ผŒไปŽ่€Œๆ›ดๅ‡†็กฎๅœฐ่งฃ้‡Šๆไพ›็š„ๆŒ‡ไปคใ€‚ๅฏผๆผ”ๆจกๅผๅขžๅผบไบ†่ฏธๅฆ‚ๆž„ๅ›พใ€ๅ…‰็…งๅ’Œๆ‘„ๅƒๆœบ็งปๅŠจ็ญ‰ๆ–น้ข็š„ๆ่ฟฐ๏ผŒๅ€พๅ‘ไบŽ็”Ÿๆˆ่ง†่ง‰่ดจ้‡ๆ›ด้ซ˜็š„่ง†้ข‘ใ€‚ๆณจๆ„๏ผŒ่ฟ™็งๅขžๅผบๆœ‰ๆ—ถๅฏ่ƒฝไผšๅฏผ่‡ดไธ€ไบ›่ฏญไน‰็ป†่Š‚็š„ไธขๅคฑใ€‚
Prompt ๆ”นๅ†™ๆจกๅž‹ๅฏไปฅ็›ดๆŽฅไฝฟ็”จ [Hunyuan-Large](https://github.com/Tencent/Tencent-Hunyuan-Large) ้ƒจ็ฝฒๅ’ŒๆŽจ็†. ๆˆ‘ไปฌๅผ€ๆบไบ† prompt ๆ”นๅ†™ๆจกๅž‹็š„ๆƒ้‡๏ผŒ่ง[่ฟ™้‡Œ](https://huggingface.co/Tencent/HunyuanVideo-PromptRewrite).
## ๐Ÿ“ˆ ่ƒฝๅŠ›่ฏ„ไผฐ
ไธบไบ†่ฏ„ไผฐ HunyuanVideo ็š„่ƒฝๅŠ›๏ผŒๆˆ‘ไปฌ้€‰ๆ‹ฉไบ†ๅ››ไธช้—ญๆบ่ง†้ข‘็”Ÿๆˆๆจกๅž‹ไฝœไธบๅฏนๆฏ”ใ€‚ๆˆ‘ไปฌๆ€ปๅ…ฑไฝฟ็”จไบ† 1,533 ไธช prompt๏ผŒๆฏไธช prompt ้€š่ฟ‡ไธ€ๆฌกๆŽจ็†็”Ÿๆˆไบ†็›ธๅŒๆ•ฐ้‡็š„่ง†้ข‘ๆ ทๆœฌใ€‚ไธบไบ†ๅ…ฌๅนณๆฏ”่พƒ๏ผŒๆˆ‘ไปฌๅช่ฟ›่กŒไบ†ไธ€ๆฌกๆŽจ็†ไปฅ้ฟๅ…ไปปไฝ•ๆŒ‘้€‰ใ€‚ๅœจไธŽๅ…ถไป–ๆ–นๆณ•ๆฏ”่พƒๆ—ถ๏ผŒๆˆ‘ไปฌไฟๆŒไบ†ๆ‰€ๆœ‰้€‰ๆ‹ฉๆจกๅž‹็š„้ป˜่ฎค่ฎพ็ฝฎ๏ผŒๅนถ็กฎไฟไบ†่ง†้ข‘ๅˆ†่พจ็އ็š„ไธ€่‡ดๆ€งใ€‚่ง†้ข‘ๆ นๆฎไธ‰ไธชๆ ‡ๅ‡†่ฟ›่กŒ่ฏ„ไผฐ๏ผšๆ–‡ๆœฌๅฏน้ฝใ€่ฟๅŠจ่ดจ้‡ๅ’Œ่ง†่ง‰่ดจ้‡ใ€‚ๅœจ 60 ๅคšๅไธ“ไธš่ฏ„ไผฐไบบๅ‘˜่ฏ„ไผฐๅŽ๏ผŒHunyuanVideo ๅœจ็ปผๅˆๆŒ‡ๆ ‡ไธŠ่กจ็Žฐๆœ€ๅฅฝ๏ผŒ็‰นๅˆซๆ˜ฏๅœจ่ฟๅŠจ่ดจ้‡ๆ–น้ข่กจ็Žฐ่พƒไธบ็ชๅ‡บใ€‚
<p align="center">
<table>
<thead>
<tr>
<th rowspan="2">ๆจกๅž‹</th> <th rowspan="2">ๆ˜ฏๅฆๅผ€ๆบ</th> <th>ๆ—ถ้•ฟ</th> <th>ๆ–‡ๆœฌๅฏน้ฝ</th> <th>่ฟๅŠจ่ดจ้‡</th> <th rowspan="2">่ง†่ง‰่ดจ้‡</th> <th rowspan="2">็ปผๅˆ่ฏ„ไปท</th> <th rowspan="2">ๆŽ’ๅบ</th>
</tr>
</thead>
<tbody>
<tr>
<td>HunyuanVideo (Ours)</td> <td> โœ” </td> <td>5s</td> <td>61.8%</td> <td>66.5%</td> <td>95.7%</td> <td>41.3%</td> <td>1</td>
</tr>
<tr>
<td>ๅ›ฝๅ†…ๆจกๅž‹ A (API)</td> <td> &#10008 </td> <td>5s</td> <td>62.6%</td> <td>61.7%</td> <td>95.6%</td> <td>37.7%</td> <td>2</td>
</tr>
<tr>
<td>ๅ›ฝๅ†…ๆจกๅž‹ B (Web)</td> <td> &#10008</td> <td>5s</td> <td>60.1%</td> <td>62.9%</td> <td>97.7%</td> <td>37.5%</td> <td>3</td>
</tr>
<tr>
<td>GEN-3 alpha (Web)</td> <td>&#10008</td> <td>6s</td> <td>47.7%</td> <td>54.7%</td> <td>97.5%</td> <td>27.4%</td> <td>4</td>
</tr>
<tr>
<td>Luma1.6 (API)</td><td>&#10008</td> <td>5s</td> <td>57.6%</td> <td>44.2%</td> <td>94.1%</td> <td>24.8%</td> <td>5</td>
</tr>
</tbody>
</table>
</p>
## ๐Ÿ“œ ่ฟ่กŒ้…็ฝฎ
ไธ‹่กจๅˆ—ๅ‡บไบ†่ฟ่กŒ HunyuanVideo ๆจกๅž‹ไฝฟ็”จๆ–‡ๆœฌ็”Ÿๆˆ่ง†้ข‘็š„ๆŽจ่้…็ฝฎ๏ผˆbatch size = 1๏ผ‰๏ผš
| ๆจกๅž‹ | ๅˆ†่พจ็އ<br/>(height/width/frame) | ๅณฐๅ€ผๆ˜พๅญ˜ |
|:--------------:|:--------------------------------:|:----------------:|
| HunyuanVideo | 720px1280px129f | 60G |
| HunyuanVideo | 544px960px129f | 45G |
* ๆœฌ้กน็›ฎ้€‚็”จไบŽไฝฟ็”จ NVIDIA GPU ๅ’Œๆ”ฏๆŒ CUDA ็š„่ฎพๅค‡
* ๆจกๅž‹ๅœจๅ•ๅผ  80G GPU ไธŠๆต‹่ฏ•
* ่ฟ่กŒ 720px1280px129f ็š„ๆœ€ๅฐๆ˜พๅญ˜่ฆๆฑ‚ๆ˜ฏ 60GB๏ผŒ544px960px129f ็š„ๆœ€ๅฐๆ˜พๅญ˜่ฆๆฑ‚ๆ˜ฏ 45GBใ€‚
* ๆต‹่ฏ•ๆ“ไฝœ็ณป็ปŸ๏ผšLinux
## ๐Ÿ› ๏ธ ๅฎ‰่ฃ…ๅ’Œไพ่ต–
้ฆ–ๅ…ˆๅ…‹้š† git ไป“ๅบ“:
```shell
git clone https://github.com/tencent/HunyuanVideo
cd HunyuanVideo
```
### Linux ๅฎ‰่ฃ…ๆŒ‡ๅผ•
ๆˆ‘ไปฌๆŽจ่ไฝฟ็”จ CUDA 12.4 ๆˆ– 11.8 ็š„็‰ˆๆœฌใ€‚
Conda ็š„ๅฎ‰่ฃ…ๆŒ‡ๅ—ๅฏไปฅๅ‚่€ƒ[่ฟ™้‡Œ](https://docs.anaconda.com/free/miniconda/index.html)ใ€‚
```shell
# 1. Create conda environment
conda create -n HunyuanVideo python==3.10.9
# 2. Activate the environment
conda activate HunyuanVideo
# 3. Install PyTorch and other dependencies using conda
# For CUDA 11.8
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=11.8 -c pytorch -c nvidia
# For CUDA 12.4
conda install pytorch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 pytorch-cuda=12.4 -c pytorch -c nvidia
# 4. Install pip dependencies
python -m pip install -r requirements.txt
# 5. Install flash attention v2 for acceleration (requires CUDA 11.8 or above)
python -m pip install ninja
python -m pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
# 6. Install xDiT for parallel inference (It is recommended to use torch 2.4.0 and flash-attn 2.6.3)
python -m pip install xfuser==0.4.0
```
ๅฆ‚ๆžœๅœจ็‰นๅฎš GPU ๅž‹ๅทไธŠ้ญ้‡ float point exception(core dump) ้—ฎ้ข˜๏ผŒๅฏๅฐ่ฏ•ไปฅไธ‹ๆ–นๆกˆไฟฎๅค๏ผš
```shell
#้€‰้กน1๏ผš็กฎไฟๅทฒๆญฃ็กฎๅฎ‰่ฃ… CUDA 12.4, CUBLAS>=12.4.5.8, ๅ’Œ CUDNN>=9.00 (ๆˆ–็›ดๆŽฅไฝฟ็”จๆˆ‘ไปฌๆไพ›็š„CUDA12้•œๅƒ)
pip install nvidia-cublas-cu12==12.4.5.8
export LD_LIBRARY_PATH=/opt/conda/lib/python3.8/site-packages/nvidia/cublas/lib/
#้€‰้กน2๏ผšๅผบๅˆถๆ˜พๅผไฝฟ็”จ CUDA11.8 ็ผ–่ฏ‘็š„ Pytorch ็‰ˆๆœฌไปฅๅŠๅ…ถไป–ๆ‰€ๆœ‰่ฝฏไปถๅŒ…
pip uninstall -r requirements.txt # ็กฎไฟๅธ่ฝฝๆ‰€ๆœ‰ไพ่ต–ๅŒ…
pip uninstall -y xfuser
pip install torch==2.4.0 --index-url https://download.pytorch.org/whl/cu118
pip install -r requirements.txt
pip install ninja
pip install git+https://github.com/Dao-AILab/flash-attention.git@v2.6.3
pip install xfuser==0.4.0
```
ๅฆๅค–๏ผŒๆˆ‘ไปฌๆไพ›ไบ†ไธ€ไธช้ข„ๆž„ๅปบ็š„ Docker ้•œๅƒ๏ผŒๅฏไปฅไฝฟ็”จๅฆ‚ไธ‹ๅ‘ฝไปค่ฟ›่กŒๆ‹‰ๅ–ๅ’Œ่ฟ่กŒใ€‚
```shell
# ็”จไบŽ CUDA 12.4 (ๅทฒๆ›ดๆ–ฐ้ฟๅ… float point exception)
docker pull hunyuanvideo/hunyuanvideo:cuda_12
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_12
# ็”จไบŽ CUDA 11.8
docker pull hunyuanvideo/hunyuanvideo:cuda_11
docker run -itd --gpus all --init --net=host --uts=host --ipc=host --name hunyuanvideo --security-opt=seccomp=unconfined --ulimit=stack=67108864 --ulimit=memlock=-1 --privileged hunyuanvideo/hunyuanvideo:cuda_11
```
## ๐Ÿงฑ ไธ‹่ฝฝ้ข„่ฎญ็ปƒๆจกๅž‹
ไธ‹่ฝฝ้ข„่ฎญ็ปƒๆจกๅž‹ๅ‚่€ƒ[่ฟ™้‡Œ](ckpts/README.md)ใ€‚
## ๐Ÿ”‘ ๅ•ๅกๆŽจ็†
ๆˆ‘ไปฌๅœจไธ‹่กจไธญๅˆ—ๅ‡บไบ†ๆ”ฏๆŒ็š„้ซ˜ๅบฆ/ๅฎฝๅบฆ/ๅธงๆ•ฐ่ฎพ็ฝฎใ€‚
| ๅˆ†่พจ็އ | h/w=9:16 | h/w=16:9 | h/w=4:3 | h/w=3:4 | h/w=1:1 |
|:---------------------:|:----------------------------:|:---------------:|:---------------:|:---------------:|:---------------:|
| 540p | 544px960px129f | 960px544px129f | 624px832px129f | 832px624px129f | 720px720px129f |
| 720p (ๆŽจ่) | 720px1280px129f | 1280px720px129f | 1104px832px129f | 832px1104px129f | 960px960px129f |
### ไฝฟ็”จๅ‘ฝไปค่กŒ
```bash
cd HunyuanVideo
python3 sample_video.py \
--video-size 720 1280 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--flow-reverse \
--use-cpu-offload \
--save-path ./results
```
### ่ฟ่กŒgradioๆœๅŠก
```bash
python3 gradio_server.py --flow-reverse
# set SERVER_NAME and SERVER_PORT manually
# SERVER_NAME=0.0.0.0 SERVER_PORT=8081 python3 gradio_server.py --flow-reverse
```
### ๆ›ดๅคš้…็ฝฎ
ไธ‹้ขๅˆ—ๅ‡บไบ†ๆ›ดๅคšๅ…ณ้”ฎ้…็ฝฎ้กน๏ผš
| ๅ‚ๆ•ฐ | ้ป˜่ฎคๅ€ผ | ๆ่ฟฐ |
|:----------------------:|:---------:|:-----------------------------------------:|
| `--prompt` | None | ็”จไบŽ็”Ÿๆˆ่ง†้ข‘็š„ prompt |
| `--video-size` | 720 1280 | ็”Ÿๆˆ่ง†้ข‘็š„้ซ˜ๅบฆๅ’Œๅฎฝๅบฆ |
| `--video-length` | 129 | ็”Ÿๆˆ่ง†้ข‘็š„ๅธงๆ•ฐ |
| `--infer-steps` | 50 | ็”Ÿๆˆๆ—ถ้‡‡ๆ ท็š„ๆญฅๆ•ฐ |
| `--embedded-cfg-scale` | 6.0 | ๆ–‡ๆœฌ็š„ๆŽงๅˆถๅผบๅบฆ |
| `--flow-shift` | 7.0 | ๆŽจ็†ๆ—ถ timestep ็š„ shift ็ณปๆ•ฐ๏ผŒๅ€ผ่ถŠๅคง๏ผŒ้ซ˜ๅ™ชๅŒบๅŸŸ้‡‡ๆ ทๆญฅๆ•ฐ่ถŠๅคš |
| `--flow-reverse` | False | If reverse, learning/sampling from t=1 -> t=0 |
| `--neg-prompt` | None | ่ดŸๅ‘่ฏ |
| `--seed` | 0 | ้šๆœบ็งๅญ |
| `--use-cpu-offload` | False | ๅฏ็”จ CPU offload๏ผŒๅฏไปฅ่Š‚็œๆ˜พๅญ˜ |
| `--save-path` | ./results | ไฟๅญ˜่ทฏๅพ„ |
## ๐Ÿš€ ไฝฟ็”จ xDiT ๅฎž็Žฐๅคšๅกๅนถ่กŒๆŽจ็†
[xDiT](https://github.com/xdit-project/xDiT) ๆ˜ฏไธ€ไธช้’ˆๅฏนๅคš GPU ้›†็พค็š„ๆ‰ฉๅฑ•ๆŽจ็†ๅผ•ๆ“Ž๏ผŒ็”จไบŽๆ‰ฉๅฑ• Transformers๏ผˆDiTs๏ผ‰ใ€‚
ๅฎƒๆˆๅŠŸไธบๅ„็ง DiT ๆจกๅž‹๏ผˆๅŒ…ๆ‹ฌ mochi-1ใ€CogVideoXใ€Flux.1ใ€SD3 ็ญ‰๏ผ‰ๆไพ›ไบ†ไฝŽๅปถ่ฟŸ็š„ๅนถ่กŒๆŽจ็†่งฃๅ†ณๆ–นๆกˆใ€‚่ฏฅๅญ˜ๅ‚จๅบ“้‡‡็”จไบ† [Unified Sequence Parallelism (USP)](https://arxiv.org/abs/2405.07719) API ็”จไบŽๆททๅ…ƒ่ง†้ข‘ๆจกๅž‹็š„ๅนถ่กŒๆŽจ็†ใ€‚
### ไฝฟ็”จๅ‘ฝไปค่กŒ
ไพ‹ๅฆ‚๏ผŒๅฏ็”จๅฆ‚ไธ‹ๅ‘ฝไปคไฝฟ็”จ8ๅผ GPUๅกๅฎŒๆˆๆŽจ็†
```bash
cd HunyuanVideo
torchrun --nproc_per_node=8 sample_video_parallel.py \
--video-size 1280 720 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--flow-reverse \
--seed 42 \
--ulysses_degree 8 \
--ring_degree 1 \
--save-path ./results
```
ๅฏไปฅ้…็ฝฎ`--ulysses-degree`ๅ’Œ`--ring-degree`ๆฅๆŽงๅˆถๅนถ่กŒ้…็ฝฎ๏ผŒๅฏ้€‰ๅ‚ๆ•ฐๅฆ‚ไธ‹ใ€‚
<details>
<summary>ๆ”ฏๆŒ็š„ๅนถ่กŒ้…็ฝฎ (็‚นๅ‡ปๆŸฅ็œ‹่ฏฆๆƒ…)</summary>
| --video-size | --video-length | --ulysses-degree x --ring-degree | --nproc_per_node |
|----------------------|----------------|----------------------------------|------------------|
| 1280 720 ๆˆ– 720 1280 | 129 | 8x1,4x2,2x4,1x8 | 8 |
| 1280 720 ๆˆ– 720 1280 | 129 | 1x5 | 5 |
| 1280 720 ๆˆ– 720 1280 | 129 | 4x1,2x2,1x4 | 4 |
| 1280 720 ๆˆ– 720 1280 | 129 | 3x1,1x3 | 3 |
| 1280 720 ๆˆ– 720 1280 | 129 | 2x1,1x2 | 2 |
| 1104 832 ๆˆ– 832 1104 | 129 | 4x1,2x2,1x4 | 4 |
| 1104 832 ๆˆ– 832 1104 | 129 | 3x1,1x3 | 3 |
| 1104 832 ๆˆ– 832 1104 | 129 | 2x1,1x2 | 2 |
| 960 960 | 129 | 6x1,3x2,2x3,1x6 | 6 |
| 960 960 | 129 | 4x1,2x2,1x4 | 4 |
| 960 960 | 129 | 3x1,1x3 | 3 |
| 960 960 | 129 | 1x2,2x1 | 2 |
| 960 544 ๆˆ– 544 960 | 129 | 6x1,3x2,2x3,1x6 | 6 |
| 960 544 ๆˆ– 544 960 | 129 | 4x1,2x2,1x4 | 4 |
| 960 544 ๆˆ– 544 960 | 129 | 3x1,1x3 | 3 |
| 960 544 ๆˆ– 544 960 | 129 | 1x2,2x1 | 2 |
| 832 624 ๆˆ– 624 832 | 129 | 4x1,2x2,1x4 | 4 |
| 624 832 ๆˆ– 624 832 | 129 | 3x1,1x3 | 3 |
| 832 624 ๆˆ– 624 832 | 129 | 2x1,1x2 | 2 |
| 720 720 | 129 | 1x5 | 5 |
| 720 720 | 129 | 3x1,1x3 | 3 |
</details>
<p align="center">
<table align="center">
<thead>
<tr>
<th colspan="4">ๅœจ 8xGPUไธŠ็”Ÿๆˆ1280x720 (129 ๅธง 50 ๆญฅ)็š„ๆ—ถ่€— (็ง’) </th>
</tr>
<tr>
<th>1</th>
<th>2</th>
<th>4</th>
<th>8</th>
</tr>
</thead>
<tbody>
<tr>
<th>1904.08</th>
<th>934.09 (2.04x)</th>
<th>514.08 (3.70x)</th>
<th>337.58 (5.64x)</th>
</tr>
</tbody>
</table>
</p>
## ๐Ÿš€ FP8 Inference
ไฝฟ็”จFP8้‡ๅŒ–ๅŽ็š„HunyuanVideoๆจกๅž‹่ƒฝๅคŸๅธฎๆ‚จ่Š‚็œๅคงๆฆ‚10GBๆ˜พๅญ˜ใ€‚ ไฝฟ็”จๅ‰้œ€่ฆไปŽ Huggingface ไธ‹่ฝฝ[FP8ๆƒ้‡](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8.pt)ๅ’Œๆฏๅฑ‚้‡ๅŒ–ๆƒ้‡็š„[scaleๅ‚ๆ•ฐ](https://huggingface.co/tencent/HunyuanVideo/blob/main/hunyuan-video-t2v-720p/transformers/mp_rank_00_model_states_fp8_map.pt).
### Using Command Line
่ฟ™้‡Œ๏ผŒๆ‚จๅฟ…้กปๆ˜พ็คบๅœฐๆŒ‡ๅฎšFP8็š„ๆƒ้‡่ทฏๅพ„ใ€‚ไพ‹ๅฆ‚๏ผŒๅฏ็”จๅฆ‚ไธ‹ๅ‘ฝไปคไฝฟ็”จFP8ๆจกๅž‹ๆŽจ็†
```bash
cd HunyuanVideo
DIT_CKPT_PATH={PATH_TO_FP8_WEIGHTS}/{WEIGHT_NAME}_fp8.pt
python3 sample_video.py \
--dit-weight ${DIT_CKPT_PATH} \
--video-size 1280 720 \
--video-length 129 \
--infer-steps 50 \
--prompt "A cat walks on the grass, realistic style." \
--seed 42 \
--embedded-cfg-scale 6.0 \
--flow-shift 7.0 \
--flow-reverse \
--use-cpu-offload \
--use-fp8 \
--save-path ./results
```
## ๐Ÿ”— BibTeX
ๅฆ‚ๆžœๆ‚จ่ฎคไธบ [HunyuanVideo](https://arxiv.org/abs/2412.03603) ็ป™ๆ‚จ็š„็ ”็ฉถๅ’Œๅบ”็”จๅธฆๆฅไบ†ไธ€ไบ›ๅธฎๅŠฉ๏ผŒๅฏไปฅ้€š่ฟ‡ไธ‹้ข็š„ๆ–นๅผๆฅๅผ•็”จ:
```BibTeX
@misc{kong2024hunyuanvideo,
title={HunyuanVideo: A Systematic Framework For Large Video Generative Models},
author={Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, Kathrina Wu, Qin Lin, Aladdin Wang, Andong Wang, Changlin Li, Duojun Huang, Fang Yang, Hao Tan, Hongmei Wang, Jacob Song, Jiawang Bai, Jianbing Wu, Jinbao Xue, Joey Wang, Junkun Yuan, Kai Wang, Mengyang Liu, Pengyu Li, Shuai Li, Weiyan Wang, Wenqing Yu, Xinchi Deng, Yang Li, Yanxin Long, Yi Chen, Yutao Cui, Yuanbo Peng, Zhentao Yu, Zhiyu He, Zhiyong Xu, Zixiang Zhou, Zunnan Xu, Yangyu Tao, Qinglin Lu, Songtao Liu, Dax Zhou, Hongfa Wang, Yong Yang, Di Wang, Yuhong Liu, and Jie Jiang, along with Caesar Zhong},
year={2024},
archivePrefix={arXiv preprint arXiv:2412.03603},
primaryClass={cs.CV}
}
```
## ่‡ด่ฐข
HunyuanVideo ็š„ๅผ€ๆบ็ฆปไธๅผ€่ฏธๅคšๅผ€ๆบๅทฅไฝœ๏ผŒ่ฟ™้‡Œๆˆ‘ไปฌ็‰นๅˆซๆ„Ÿ่ฐข [SD3](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [FLUX](https://github.com/black-forest-labs/flux), [Llama](https://github.com/meta-llama/llama), [LLaVA](https://github.com/haotian-liu/LLaVA), [Xtuner](https://github.com/InternLM/xtuner), [diffusers](https://github.com/huggingface/diffusers) and [HuggingFace](https://huggingface.co) ็š„ๅผ€ๆบๅทฅไฝœๅ’ŒๆŽข็ดขใ€‚ๅฆๅค–๏ผŒๆˆ‘ไปฌไนŸๆ„Ÿ่ฐข่…พ่ฎฏๆททๅ…ƒๅคšๆจกๆ€ๅ›ข้˜Ÿๅฏน HunyuanVideo ้€‚้…ๅคš็งๆ–‡ๆœฌ็ผ–็ ๅ™จ็š„ๆ”ฏๆŒใ€‚
## Star ่ถ‹ๅŠฟ
<a href="https://star-history.com/#Tencent/HunyuanVideo&Date">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="https://api.star-history.com/svg?repos=Tencent/HunyuanVideo&type=Date&theme=dark" />
<source media="(prefers-color-scheme: light)" srcset="https://api.star-history.com/svg?repos=Tencent/HunyuanVideo&type=Date" />
<img alt="Star History Chart" src="https://api.star-history.com/svg?repos=Tencent/HunyuanVideo&type=Date" />
</picture>
</a>