VideoX-Fun
ð ããããïŒ
English | ç®äœäžæ | æ¥æ¬èª
ç®æ¬¡
- ç®æ¬¡
- 玹ä»
- ã¯ã€ãã¯ã¹ã¿ãŒã
- ãããªçµæ
- äœ¿çšæ¹æ³
- ã¢ãã«ã®å Žæ
- åèæç®
- ã©ã€ã»ã³ã¹
玹ä»
VideoX-Funã¯ãããªçæã®ãã€ãã©ã€ã³ã§ãããAIç»åããããªã®çæãDiffusion Transformerã®ããŒã¹ã©ã€ã³ã¢ãã«ãšLoraã¢ãã«ã®ãã¬ãŒãã³ã°ã«äœ¿çšã§ããŸããæã ã¯ããã§ã«åŠç¿æžã¿ã®ããŒã¹ã©ã€ã³ã¢ãã«ããçŽæ¥äºæž¬ãè¡ããç°ãªãè§£å床ãç§æ°ãFPSã®ãããªãçæããããšããµããŒãããŠããŸãããŸãããŠãŒã¶ãŒãç¬èªã®ããŒã¹ã©ã€ã³ã¢ãã«ãLoraã¢ãã«ããã¬ãŒãã³ã°ããç¹å®ã®ã¹ã¿ã€ã«å€æãè¡ãããšããµããŒãããŠããŸãã
ç°ãªããã©ãããã©ãŒã ããã®ã¯ã€ãã¯ã¹ã¿ãŒãããµããŒãããŸãã詳现ã¯ã¯ã€ãã¯ã¹ã¿ãŒããåç §ããŠãã ããã
æ°æ©èœïŒ
- Wan 2.2ã·ãªãŒãºã¢ãã«ãWan-VACEå¶åŸ¡ã¢ãã«ãFantasy Talkingããžã¿ã«ãã¥ãŒãã³ã¢ãã«ãQwen-ImageãFluxç»åçæã¢ãã«ãªã©ã®ãµããŒãã远å ããŸããã[2025.10.16]
- Wan2.1-Fun-V1.1ããŒãžã§ã³ãæŽæ°ïŒ14Bãš1.3Bã¢ãã«ã®ControlïŒåç §ç»åã¢ãã«ããµããŒããã«ã¡ã©å¶åŸ¡ã«ã察å¿ãããã«ãInpaintã¢ãã«ãåèšç·Žããæ§èœãåäžããŸããã[2025.04.25]
- Wan2.1-Fun-V1.0ã®æŽæ°ïŒ14Bããã³1.3Bã®I2VïŒç»åãããããªïŒã¢ãã«ãšControlã¢ãã«ããµããŒãããéå§ãã¬ãŒã ãšçµäºãã¬ãŒã ã®äºæž¬ã«å¯Ÿå¿ã[2025.03.26]
- CogVideoX-Fun-V1.5ã®æŽæ°ïŒI2Vã¢ãã«ãšé¢é£ãããã¬ãŒãã³ã°ã»äºæž¬ã³ãŒããã¢ããããŒãã[2024.12.16]
- å ±é ¬Loraã®ãµããŒãïŒå ±é ¬éäŒææè¡ã䜿çšããŠLoraããã¬ãŒãã³ã°ããçæãããåç»ãæé©åãã人éã®å¥œã¿ã«ããããäžèŽãããã詳现æ å ±ãæ°ããããŒãžã§ã³ã®å¶åŸ¡ã¢ãã«ã§ã¯ãCannyãDepthãPoseãMLSDãªã©ã®ç°ãªãå¶åŸ¡æ¡ä»¶ã«å¯Ÿå¿ã[2024.11.21]
- diffusersã®ãµããŒãïŒCogVideoX-Fun Controlãdiffusersã§ãµããŒããããããã«ãªããŸãããa-r-r-o-wããã®PRã§ãµããŒããæäŸããŠãããããšã«æè¬ããŸãã詳现ã¯ããã¥ã¡ã³ããã芧ãã ããã[2024.10.16]
- CogVideoX-Fun-V1.1ã®æŽæ°ïŒi2vã¢ãã«ãåãã¬ãŒãã³ã°ããNoiseã远å ããŠåç»ã®åãã®ç¯å²ãæ¡å€§ãå¶åŸ¡ã¢ãã«ã®ãã¬ãŒãã³ã°ã³ãŒããšControlã¢ãã«ãã¢ããããŒãã[2024.09.29]
- CogVideoX-Fun-V1.0ã®æŽæ°ïŒã³ãŒããäœæïŒWindowsãšLinuxã«å¯Ÿå¿ããŸããã2Bããã³5Bã¢ãã«ã§ã®æå€§256x256x49ãã1024x1024x49ãŸã§ã®ä»»æã®è§£å床ã®åç»çæããµããŒãã[2024.09.18]
æ©èœïŒ
ç§ãã¡ã®UIã€ã³ã¿ãŒãã§ãŒã¹ã¯æ¬¡ã®ãšããã§ãïŒ

ã¯ã€ãã¯ã¹ã¿ãŒã
1. ã¯ã©ãŠã䜿çš: AliyunDSW/Docker
a. AliyunDSWãã
DSWã«ã¯ç¡æã®GPUæéãããããŠãŒã¶ãŒã¯äžåºŠç³è«ã§ããç³è«åŸ3ãæéæå¹ã§ãã
Aliyunã¯Freetierã§ç¡æã®GPUæéãæäŸããŠããŸããååŸããŠAliyun PAI-DSWã§äœ¿çšãã5å以å ã«CogVideoX-Funãéå§ã§ããŸãïŒ
b. ComfyUIãã
ç§ãã¡ã®ComfyUIã¯æ¬¡ã®ãšããã§ãã詳现ã¯ComfyUI READMEãåç
§ããŠãã ããã

c. Dockerãã
Dockerã䜿çšããå Žåããã·ã³ã«ã°ã©ãã£ãã¯ã¹ã«ãŒããã©ã€ããšCUDAç°å¢ãæ£ããã€ã³ã¹ããŒã«ãããŠããããšã確èªããŠãã ããã
次ã®ã³ãã³ãããã®æ¹æ³ã§å®è¡ããŸãïŒ
# ã€ã¡ãŒãžããã«
docker pull mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun
# ã€ã¡ãŒãžã«å
¥ã
docker run -it -p 7860:7860 --network host --gpus all --security-opt seccomp:unconfined --shm-size 200g mybigpai-public-registry.cn-beijing.cr.aliyuncs.com/easycv/torch_cuda:cogvideox_fun
# ã³ãŒããã¯ããŒã³
git clone https://github.com/aigc-apps/VideoX-Fun.git
# VideoX-Funã®ãã£ã¬ã¯ããªã«å
¥ã
cd VideoX-Fun
# éã¿ãããŠã³ããŒã
mkdir models/Diffusion_Transformer
mkdir models/Personalized_Model
# Please use the hugginface link or modelscope link to download the model.
# CogVideoX-Fun
# https://huggingface.co/alibaba-pai/CogVideoX-Fun-V1.1-5b-InP
# https://modelscope.cn/models/PAI/CogVideoX-Fun-V1.1-5b-InP
# Wan
# https://huggingface.co/alibaba-pai/Wan2.1-Fun-V1.1-14B-InP
# https://modelscope.cn/models/PAI/Wan2.1-Fun-V1.1-14B-InP
2. ããŒã«ã«ã€ã³ã¹ããŒã«: ç°å¢ãã§ãã¯/ããŠã³ããŒã/ã€ã³ã¹ããŒã«
a. ç°å¢ãã§ãã¯
以äžã®ç°å¢ã§ãã®ã©ã€ãã©ãªã®å®è¡ã確èªããŠããŸãïŒ
Windowsã®è©³çްïŒ
- OS: Windows 10
- python: python3.10 & python3.11
- pytorch: torch2.2.0
- CUDA: 11.8 & 12.1
- CUDNN: 8+
- GPUïŒ Nvidia-3060 12G & Nvidia-3090 24G
Linuxã®è©³çްïŒ
- OS: Ubuntu 20.04, CentOS
- python: python3.10 & python3.11
- pytorch: torch2.2.0
- CUDA: 11.8 & 12.1
- CUDNN: 8+
- GPUïŒNvidia-V100 16G & Nvidia-A10 24G & Nvidia-A100 40G & Nvidia-A100 80G
éã¿ãä¿åããããã«çŽ60GBã®ãã£ã¹ã¯ã¹ããŒã¹ãå¿ èŠã§ãã確èªããŠãã ããïŒ
b. éã¿
éã¿ãæå®ããããã¹ã«é 眮ããããšããå§ãããŸãïŒ
ComfyUIãéããŠ:
ã¢ãã«ãComfyUIã®éã¿ãã©ã«ã ComfyUI/models/Fun_Models/ ã«å
¥ããŸãïŒ
ðŠ ComfyUI/
âââ ð models/
â âââ ð Fun_Models/
â âââ ð CogVideoX-Fun-V1.1-2b-InP/
â âââ ð CogVideoX-Fun-V1.1-5b-InP/
â âââ ð Wan2.1-Fun-V1.1-14B-InP
â âââ ð Wan2.1-Fun-V1.1-1.3B-InP/
ç¬èªã®pythonãã¡ã€ã«ãŸãã¯UIã€ã³ã¿ãŒãã§ãŒã¹ãå®è¡:
ðŠ models/
âââ ð Diffusion_Transformer/
â âââ ð CogVideoX-Fun-V1.1-2b-InP/
â âââ ð CogVideoX-Fun-V1.1-5b-InP/
â âââ ð Wan2.1-Fun-V1.1-14B-InP
â âââ ð Wan2.1-Fun-V1.1-1.3B-InP/
âââ ð Personalized_Model/
â âââ ããªãã®ãã¬ãŒãã³ã°æžã¿ã®ãã©ã³ã¹ãã©ãŒããŒã¢ãã« / ããªãã®ãã¬ãŒãã³ã°æžã¿ã®Loraã¢ãã«ïŒUIããŒãçšïŒ
ãããªçµæ
Wan2.1-Fun-V1.1-14B-InP && Wan2.1-Fun-V1.1-1.3B-InP
Wan2.1-Fun-V1.1-14B-Control && Wan2.1-Fun-V1.1-1.3B-Control
Generic Control Video + Reference Image:
| Reference Image | Control Video | Wan2.1-Fun-V1.1-14B-Control | Wan2.1-Fun-V1.1-1.3B-Control |
|
|
|||
Generic Control Video (Canny, Pose, Depth, etc.) and Trajectory Control:
Wan2.1-Fun-V1.1-14B-Control-Camera && Wan2.1-Fun-V1.1-1.3B-Control-Camera
| Pan Up | Pan Left | Pan Right |
| Pan Down | Pan Up + Pan Left | Pan Up + Pan Right |
CogVideoX-Fun-V1.1-5B
è§£å床-1024
è§£å床-768
è§£å床-512
CogVideoX-Fun-V1.1-5B-Control
| çŸããæŸãã ç®ãšé髪ã®è¥ã女æ§ãçœãæãçãŠäœãã²ãããã«ã¡ã©ã¯åœŒå¥³ã®é¡ã«çŠç¹ãåãããŠããŸããé«å質ãåäœãæé«å質ãé«è§£å床ãè¶ åŸ®çŽ°ã倢ã®ãããªã | çŸããæŸãã ç®ãšé髪ã®è¥ã女æ§ãçœãæãçãŠäœãã²ãããã«ã¡ã©ã¯åœŒå¥³ã®é¡ã«çŠç¹ãåãããŠããŸããé«å質ãåäœãæé«å質ãé«è§£å床ãè¶ åŸ®çŽ°ã倢ã®ãããªã | è¥ãã¯ãã |
äœ¿ãæ¹
1. çæ
a. GPUã¡ã¢ãªç¯çŽæ¹æ³
Wan2.1ã®ãã©ã¡ãŒã¿ãéåžžã«å€§ãããããGPUã¡ã¢ãªãç¯çŽããã³ã³ã·ã¥ãŒããŒåãGPUã«é©å¿ãããå¿
èŠããããŸããåäºæž¬ãã¡ã€ã«ã«ã¯GPU_memory_modeãæäŸããŠãããmodel_cpu_offloadãmodel_cpu_offload_and_qfloat8ãsequential_cpu_offloadã®äžããéžæã§ããŸãããã®æ¹æ³ã¯CogVideoX-Funã®çæã«ãé©çšãããŸãã
model_cpu_offload: ã¢ãã«å šäœã䜿çšåŸã«CPUã«ç§»åããäžéšã®GPUã¡ã¢ãªãç¯çŽããŸããmodel_cpu_offload_and_qfloat8: ã¢ãã«å šäœã䜿çšåŸã«CPUã«ç§»åããTransformerã¢ãã«ã«å¯ŸããŠfloat8ã®éååãè¡ããããå€ãã®GPUã¡ã¢ãªãç¯çŽããŸããsequential_cpu_offload: ã¢ãã«ã®åå±€ã䜿çšåŸã«CPUã«ç§»åããŸããé床ã¯é ããªããŸããã倧éã®GPUã¡ã¢ãªãç¯çŽããŸãã
qfloat8ã¯ã¢ãã«ã®æ§èœãéšåçã«äœäžãããå¯èœæ§ããããŸãããããå€ãã®GPUã¡ã¢ãªãç¯çŽã§ããŸããååãªGPUã¡ã¢ãªãããå Žåã¯ãmodel_cpu_offloadã®äœ¿çšããå§ãããŸãã
b. ComfyUIã䜿çšãã
詳现ã¯ComfyUI READMEãã芧ãã ããã
c. Pythonãã¡ã€ã«ãå®è¡ãã
i. åäžGPUã§ã®æšè«:
- ã¹ããã1: 察å¿ããéã¿ãããŠã³ããŒããã
modelsãã©ã«ãã«é 眮ããŸãã - ã¹ããã2: ç°ãªãéã¿ãšäºæž¬ç®æšã«åºã¥ããŠãç°ãªããã¡ã€ã«ã䜿çšããŠäºæž¬ãè¡ããŸããçŸåšããã®ã©ã€ãã©ãªã¯CogVideoX-FunãWan2.1ãããã³Wan2.1-FunããµããŒãããŠããŸãã
examplesãã©ã«ãå ã®ãã©ã«ãåã§åºå¥ãããç°ãªãã¢ãã«ããµããŒãããæ©èœãç°ãªããŸãã®ã§ãç¶æ³ã«å¿ããŠåºå¥ããŠãã ããã以äžã¯CogVideoX-FunãäŸãšããŠèª¬æããŸãã- ããã¹ããããããª:
examples/cogvideox_fun/predict_t2v.pyãã¡ã€ã«ã§promptãneg_promptãguidance_scaleãseedã倿ŽããŸãã- 次ã«ã
examples/cogvideox_fun/predict_t2v.pyãã¡ã€ã«ãå®è¡ããçµæãçæãããã®ãåŸ ã¡ãŸããçµæã¯samples/cogvideox-fun-videosãã©ã«ãã«ä¿åãããŸãã
- ç»åãããããª:
examples/cogvideox_fun/predict_i2v.pyãã¡ã€ã«ã§validation_image_startãvalidation_image_endãpromptãneg_promptãguidance_scaleãseedã倿ŽããŸããvalidation_image_startã¯ãããªã®éå§ç»åãvalidation_image_endã¯ãããªã®çµäºç»åã§ãã- 次ã«ã
examples/cogvideox_fun/predict_i2v.pyãã¡ã€ã«ãå®è¡ããçµæãçæãããã®ãåŸ ã¡ãŸããçµæã¯samples/cogvideox-fun-videos_i2vãã©ã«ãã«ä¿åãããŸãã
- ãããªãããããª:
examples/cogvideox_fun/predict_v2v.pyãã¡ã€ã«ã§validation_videoãvalidation_image_endãpromptãneg_promptãguidance_scaleãseedã倿ŽããŸããvalidation_videoã¯ãããªçæã®ããã®åç §ãããªã§ãã以äžã®ãã¢ãããªã䜿çšããŠå®è¡ã§ããŸãïŒãã¢ãããª- 次ã«ã
examples/cogvideox_fun/predict_v2v.pyãã¡ã€ã«ãå®è¡ããçµæãçæãããã®ãåŸ ã¡ãŸããçµæã¯samples/cogvideox-fun-videos_v2vãã©ã«ãã«ä¿åãããŸãã
- éåžžã®å¶åŸ¡ä»ããããªçæïŒCannyãPoseãDepthãªã©ïŒ:
examples/cogvideox_fun/predict_v2v_control.pyãã¡ã€ã«ã§control_videoãvalidation_image_endãpromptãneg_promptãguidance_scaleãseedã倿ŽããŸããcontrol_videoã¯ãCannyãPoseãDepthãªã©ã®æŒç®åã§æœåºãããå¶åŸ¡çšãããªã§ãã以äžã®ãã¢ãããªã䜿çšããŠå®è¡ã§ããŸãïŒãã¢ãããª- 次ã«ã
examples/cogvideox_fun/predict_v2v_control.pyãã¡ã€ã«ãå®è¡ããçµæãçæãããã®ãåŸ ã¡ãŸããçµæã¯samples/cogvideox-fun-videos_v2v_controlãã©ã«ãã«ä¿åãããŸãã
- ããã¹ããããããª:
- ã¹ããã3: èªåã§ãã¬ãŒãã³ã°ããä»ã®ããã¯ããŒã³ãLoraãçµã¿åããããå Žåã¯ãå¿
èŠã«å¿ããŠ
examples/{model_name}/predict_t2v.pyãexamples/{model_name}/predict_i2v.pyãlora_pathãä¿®æ£ããŸãã
ii. è€æ°GPUã§ã®æšè«:
å€ã«ãŒãã§ã®æšè«ãè¡ãéã¯ãxfuserãªããžããªã®ã€ã³ã¹ããŒã«ã«æ³šæããŠãã ãããxfuser==0.4.2 ãš yunchang==0.6.2 ã®ã€ã³ã¹ããŒã«ãæšå¥šãããŸãã
pip install xfuser==0.4.2 --progress-bar off -i https://mirrors.aliyun.com/pypi/simple/
pip install yunchang==0.6.2 --progress-bar off -i https://mirrors.aliyun.com/pypi/simple/
ulysses_degree ãš ring_degree ã®ç©ã䜿çšãã GPU æ°ãšäžèŽããããšã確èªããŠãã ãããããšãã°ã8ã€ã®GPUã䜿çšããå Žåãulysses_degree=2 ãš ring_degree=4ããŸã㯠ulysses_degree=4 ãš ring_degree=2 ãèšå®ããããšãã§ããŸãã
ulysses_degreeã¯ãããïŒheadïŒã«åå²ããåŸã®äžŠååãè¡ããŸããring_degreeã¯ã·ãŒã±ã³ã¹ã«åå²ããåŸã®äžŠååãè¡ããŸãã
ring_degree 㯠ulysses_degree ãããéä¿¡ã³ã¹ããé«ãããããããã®ãã©ã¡ãŒã¿ãèšå®ããéã«ã¯ãã·ãŒã±ã³ã¹é·ãšã¢ãã«ã®ãããæ°ãèæ
®ããå¿
èŠããããŸãã
8GPUã§ã®äžŠåæšè«ãäŸã«æããŸãïŒ
Wan2.1-Fun-V1.1-14B-InP ã¯ãããæ°ã40ãããŸãããã®å Žåã
ulysses_degreeã¯40ã§å²ãåããå€ïŒäŸïŒ2, 4, 8ãªã©ïŒã«èšå®ããå¿ èŠããããŸãããããã£ãŠã8GPUã䜿çšããŠäžŠåæšè«ãè¡ãå Žåãulysses_degree=8ãšring_degree=1ãèšå®ã§ããŸããWan2.1-Fun-V1.1-1.3B-InP ã¯ãããæ°ã12ãããŸãããã®å Žåã
ulysses_degreeã¯12ã§å²ãåããå€ïŒäŸïŒ2, 4ãªã©ïŒã«èšå®ããå¿ èŠããããŸãããããã£ãŠã8GPUã䜿çšããŠäžŠåæšè«ãè¡ãå Žåãulysses_degree=4ãšring_degree=2ãèšå®ã§ããŸãã
ãã©ã¡ãŒã¿ã®èšå®ãå®äºãããã以äžã®ã³ãã³ãã§äžŠåæšè«ãå®è¡ããŠãã ããïŒ
torchrun --nproc-per-node=8 examples/wan2.1_fun/predict_t2v.py
d. UIã€ã³ã¿ãŒãã§ãŒã¹ã䜿çšãã
WebUIã¯ãããã¹ããããããªãç»åãããããªããããªãããããªãããã³éåžžã®å¶åŸ¡ä»ããããªçæïŒCannyãPoseãDepthãªã©ïŒããµããŒãããŸããçŸåšããã®ã©ã€ãã©ãªã¯CogVideoX-FunãWan2.1ãããã³Wan2.1-FunããµããŒãããŠãããexamplesãã©ã«ãå
ã®ãã©ã«ãåã§åºå¥ãããŠããŸããç°ãªãã¢ãã«ããµããŒãããæ©èœãç°ãªããããç¶æ³ã«å¿ããŠåºå¥ããŠãã ããã以äžã¯CogVideoX-FunãäŸãšããŠèª¬æããŸãã
- ã¹ããã1: 察å¿ããéã¿ãããŠã³ããŒããã
modelsãã©ã«ãã«é 眮ããŸãã - ã¹ããã2:
examples/cogvideox_fun/app.pyãã¡ã€ã«ãå®è¡ããGradioããŒãžã«å ¥ããŸãã - ã¹ããã3: ããŒãžäžã§çæã¢ãã«ãéžæãã
promptãneg_promptãguidance_scaleãseedãªã©ãå ¥åãããçæããã¯ãªãã¯ããŠçµæãçæãããã®ãåŸ ã¡ãŸããçµæã¯sampleãã©ã«ãã«ä¿åãããŸãã
2. ã¢ãã«ã®ãã¬ãŒãã³ã°
å®å šãªã¢ãã«ãã¬ãŒãã³ã°ã®æµãã«ã¯ãããŒã¿ã®ååŠçãšVideo DiTã®ãã¬ãŒãã³ã°ãå«ãŸããã¹ãã§ããç°ãªãã¢ãã«ã®ãã¬ãŒãã³ã°ããã»ã¹ã¯é¡äŒŒããŠãããããŒã¿åœ¢åŒãé¡äŒŒããŠããŸãïŒ
a. ããŒã¿ååŠç
ç»åããŒã¿ã䜿çšããŠLoraã¢ãã«ããã¬ãŒãã³ã°ããç°¡åãªãã¢ãæäŸããŸããã詳现ã¯wikiãã芧ãã ããã
é·ããããªã®ã»ã°ã¡ã³ããŒã·ã§ã³ãã¯ãªãŒãã³ã°ã説æã®ããã®å®å šãªããŒã¿ååŠçãªã³ã¯ã¯ããããªãã£ãã·ã§ã³ã»ã¯ã·ã§ã³ã®READMEãåç §ããŠãã ããã
ããã¹ãããç»åããã³ãããªçæã¢ãã«ããã¬ãŒãã³ã°ãããå Žåããã®åœ¢åŒã§ããŒã¿ã»ãããé 眮ããå¿ èŠããããŸãã
ðŠ project/
âââ ð datasets/
â âââ ð internal_datasets/
â âââ ð train/
â â âââ ð 00000001.mp4
â â âââ ð 00000002.jpg
â â âââ ð .....
â âââ ð json_of_internal_datasets.json
json_of_internal_datasets.jsonã¯æšæºã®JSONãã¡ã€ã«ã§ããjsonå ã®file_pathã¯çžå¯Ÿãã¹ãšããŠèšå®ã§ããŸãã以äžã®ããã«ïŒ
[
{
"file_path": "train/00000001.mp4",
"text": "ã¹ãŒããšãµã³ã°ã©ã¹ãçãè¥ãç·æ§ã®ã°ã«ãŒããè¡ã®éããæ©ããŠããã",
"type": "video"
},
{
"file_path": "train/00000002.jpg",
"text": "ã¹ãŒããšãµã³ã°ã©ã¹ãçãè¥ãç·æ§ã®ã°ã«ãŒããè¡ã®éããæ©ããŠããã",
"type": "image"
},
.....
]
次ã®ããã«çµ¶å¯Ÿãã¹ãšããŠèšå®ããããšãã§ããŸãïŒ
[
{
"file_path": "/mnt/data/videos/00000001.mp4",
"text": "ã¹ãŒããšãµã³ã°ã©ã¹ãçãè¥ãç·æ§ã®ã°ã«ãŒããè¡ã®éããæ©ããŠããã",
"type": "video"
},
{
"file_path": "/mnt/data/train/00000001.jpg",
"text": "ã¹ãŒããšãµã³ã°ã©ã¹ãçãè¥ãç·æ§ã®ã°ã«ãŒããè¡ã®éããæ©ããŠããã",
"type": "image"
},
.....
]
b. Video DiTãã¬ãŒãã³ã°
ããŒã¿ååŠçæã«ããŒã¿åœ¢åŒãçžå¯Ÿãã¹ã®å Žåãscripts/{model_name}/train.shãæ¬¡ã®ããã«èšå®ããŸãã
export DATASET_NAME="datasets/internal_datasets/"
export DATASET_META_NAME="datasets/internal_datasets/json_of_internal_datasets.json"
ããŒã¿åœ¢åŒã絶察ãã¹ã®å Žåãscripts/train.shãæ¬¡ã®ããã«èšå®ããŸãã
export DATASET_NAME=""
export DATASET_META_NAME="/mnt/data/json_of_internal_datasets.json"
次ã«ãscripts/train.shãå®è¡ããŸãã
sh scripts/train.sh
ããã€ãã®ãã©ã¡ãŒã¿èšå®ã®è©³çްã«ã€ããŠïŒ Wan2.1-Funã¯Readme TrainãšReadme Loraãåç §ããŠãã ããã Wan2.1ã¯Readme TrainãšReadme Loraãåç §ããŠãã ããã CogVideoX-Funã¯Readme TrainãšReadme Loraãåç §ããŠãã ããã
ã¢ãã«ã®å Žæ
1. Wan2.2-Fun
| åå | ã¹ãã¬ãŒãžå®¹é | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| Wan2.2-Fun-A14B-InP | 64.0 GB | ð€Link | ðLink | Wan2.2-Fun-14Bã®ããã¹ãã»ç»åããåç»ãçæããã¢ãã«ã®éã¿ãè€æ°ã®è§£å床ã§åŠç¿ãããŠãããåç»ã®æåãšæåŸã®ãã¬ãŒã ã®äºæž¬ããµããŒãããŠããŸãã |
| Wan2.2-Fun-A14B-Control | 64.0 GB | ð€Link | ðLink | Wan2.2-Fun-14Bã®åç»å¶åŸ¡çšéã¿ãCannyãDepthãPoseãMLSDãªã©ã®ããŸããŸãªå¶åŸ¡æ¡ä»¶ã«å¯Ÿå¿ããŠãããè»è·¡å¶åŸ¡ããµããŒãããŠããŸãã512ã768ã1024ã®è€æ°è§£å床ã§ã®åç»çæãå¯èœã§ã81ãã¬ãŒã ã16fpsã§åŠç¿ãããŠããŸããå€èšèªå¯Ÿå¿ã®äºæž¬ããµããŒãããŠããŸãã |
| Wan2.2-Fun-A14B-Contro-Camera | 64.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | Wan2.2-Fun-14Bã®ã«ã¡ã©ã¬ã³ãºå¶åŸ¡éã¿ã512ã768ã1024ã®ãã«ãè§£å床ã§ã®åç»äºæž¬ããµããŒããã81ãã¬ãŒã ãæ¯ç§16ãã¬ãŒã ã§èšç·ŽãããŠããŸããå€èšèªäºæž¬ã«å¯Ÿå¿ããŠããŸãã |
| Wan2.2-VACE-Fun-A14B | 64.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | VACEæ¹åŒã§ãã¬ãŒãã³ã°ãããWan2.2ã®å¶åŸ¡ãŠã§ã€ãïŒããŒã¹ã¢ãã«ã¯Wan2.2-T2V-A14BïŒãCannyãDepthãPoseãMLSDãè»éå¶åŸ¡ãªã©ã®ç°ãªãå¶åŸ¡æ¡ä»¶ããµããŒãããŸãã察象ãæå®ããŠåç»çæãå¯èœã§ããå€è§£å床ïŒ512ã768ã1024ïŒã®åç»äºæž¬ããµããŒããã81ãã¬ãŒã ã§16FPSã§ãã¬ãŒãã³ã°ãããŠããŸããå€èšèªäºæž¬ã«ã察å¿ããŠããŸãã |
| Wan2.2-Fun-5B-InP | 23.0 GB | ð€Link | ðLink | Wan2.2-Fun-5B ããã¹ãããåç»çæçšã®éã¿ã121ãã¬ãŒã ã24 FPSã§åŠç¿ãããå é /æ«å°Ÿãã¬ãŒã äºæž¬ããµããŒãã |
| Wan2.2-Fun-5B-Control | 23.0 GB | ð€Link | ðLink | Wan2.2-Fun-5B åç»å¶åŸ¡çšéã¿ãCannyãDepthãPoseãMLSDãªã©ã®å¶åŸ¡æ¡ä»¶ãè»éå¶åŸ¡ããµããŒãã121ãã¬ãŒã ã24 FPSã§åŠç¿ãããå€èšèªäºæž¬ã«å¯Ÿå¿ã |
| Wan2.2-Fun-5B-Control-Camera | 23.0 GB | ð€Link | ðLink | Wan2.2-Fun-5B ã«ã¡ã©ã¬ã³ãºå¶åŸ¡çšéã¿ã121ãã¬ãŒã ã24 FPSã§åŠç¿ãããå€èšèªäºæž¬ã«å¯Ÿå¿ã |
2. Wan2.2
| ã¢ãã«å | Hugging Face | Model Scope | 説æ |
|---|---|---|---|
| Wan2.2-TI2V-5B | ð€ãªã³ã¯ | ðãªã³ã¯ | äžè±¡2.2-5B ããã¹ãããåç»çæéã¿ |
| Wan2.2-T2V-A14B | ð€ãªã³ã¯ | ðãªã³ã¯ | äžè±¡2.2-14B ããã¹ãããåç»çæéã¿ |
| Wan2.2-I2V-A14B | ð€ãªã³ã¯ | ðãªã³ã¯ | äžè±¡2.2-14B ç»åããåç»çæéã¿ |
3. Wan2.1-Fun
V1.1:
| åç§° | ã¹ãã¬ãŒãžå®¹é | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| Wan2.1-Fun-V1.1-1.3B-InP | 19.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | Wan2.1-Fun-V1.1-1.3Bã®ããã¹ãã»ç»åããåç»çæã®éã¿ããã«ãè§£å床ã§èšç·ŽãããæåãšæåŸã®ç»åäºæž¬ããµããŒãããŸãã |
| Wan2.1-Fun-V1.1-14B-InP | 47.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | Wan2.1-Fun-V1.1-14Bã®ããã¹ãã»ç»åããåç»çæã®éã¿ããã«ãè§£å床ã§èšç·ŽãããæåãšæåŸã®ç»åäºæž¬ããµããŒãããŸãã |
| Wan2.1-Fun-V1.1-1.3B-Control | 19.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | Wan2.1-Fun-V1.1-1.3Bã®ãããªå¶åŸ¡éã¿ãCannyãDepthãPoseãMLSDãªã©ã®ç°ãªãå¶åŸ¡æ¡ä»¶ã«å¯Ÿå¿ããåç §ç»åïŒå¶åŸ¡æ¡ä»¶ã䜿çšããå¶åŸ¡ãè»è·¡å¶åŸ¡ããµããŒãããŸãã512ã768ã1024ã®ãã«ãè§£å床ã§ã®åç»äºæž¬ããµããŒããã81ãã¬ãŒã ãæ¯ç§16ãã¬ãŒã ã§èšç·ŽãããŠããŸããå€èšèªäºæž¬ã«å¯Ÿå¿ããŠããŸãã |
| Wan2.1-Fun-V1.1-14B-Control | 47.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | Wan2.1-Fun-V1.1-14Bã®ãããªå¶åŸ¡éã¿ãCannyãDepthãPoseãMLSDãªã©ã®ç°ãªãå¶åŸ¡æ¡ä»¶ã«å¯Ÿå¿ããåç §ç»åïŒå¶åŸ¡æ¡ä»¶ã䜿çšããå¶åŸ¡ãè»è·¡å¶åŸ¡ããµããŒãããŸãã512ã768ã1024ã®ãã«ãè§£å床ã§ã®åç»äºæž¬ããµããŒããã81ãã¬ãŒã ãæ¯ç§16ãã¬ãŒã ã§èšç·ŽãããŠããŸããå€èšèªäºæž¬ã«å¯Ÿå¿ããŠããŸãã |
| Wan2.1-Fun-V1.1-1.3B-Control-Camera | 19.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | Wan2.1-Fun-V1.1-1.3Bã®ã«ã¡ã©ã¬ã³ãºå¶åŸ¡éã¿ã512ã768ã1024ã®ãã«ãè§£å床ã§ã®åç»äºæž¬ããµããŒããã81ãã¬ãŒã ãæ¯ç§16ãã¬ãŒã ã§èšç·ŽãããŠããŸããå€èšèªäºæž¬ã«å¯Ÿå¿ããŠããŸãã |
| Wan2.1-Fun-V1.1-14B-Control-Camera | 47.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | Wan2.1-Fun-V1.1-14Bã®ã«ã¡ã©ã¬ã³ãºå¶åŸ¡éã¿ã512ã768ã1024ã®ãã«ãè§£å床ã§ã®åç»äºæž¬ããµããŒããã81ãã¬ãŒã ãæ¯ç§16ãã¬ãŒã ã§èšç·ŽãããŠããŸããå€èšèªäºæž¬ã«å¯Ÿå¿ããŠããŸãã |
V1.0:
| åç§° | ã¹ãã¬ãŒãžå®¹é | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| Wan2.1-Fun-1.3B-InP | 19.0 GB | ð€Link | ðLink | Wan2.1-Fun-1.3Bã®ããã¹ãã»ç»åããåç»çæããéã¿ããã«ãè§£å床ã§åŠç¿ãããéå§ã»çµäºç»åäºæž¬ããµããŒãã |
| Wan2.1-Fun-14B-InP | 47.0 GB | ð€Link | ðLink | Wan2.1-Fun-14Bã®ããã¹ãã»ç»åããåç»çæããéã¿ããã«ãè§£å床ã§åŠç¿ãããéå§ã»çµäºç»åäºæž¬ããµããŒãã |
| Wan2.1-Fun-1.3B-Control | 19.0 GB | ð€Link | ðLink | Wan2.1-Fun-1.3Bã®ãããªå¶åŸ¡ãŠã§ã€ããCannyãDepthãPoseãMLSDãªã©ã®ç°ãªãå¶åŸ¡æ¡ä»¶ããµããŒããããã©ãžã§ã¯ããªå¶åŸ¡ãå©çšå¯èœã512ã768ã1024ã®ãã«ãè§£å床ã§ã®ãããªäºæž¬ããµããŒããã81ãã¬ãŒã ïŒ1ç§éã«16ãã¬ãŒã ïŒã§ãã¬ãŒãã³ã°æžã¿ã§ãå€èšèªäºæž¬ã«ã察å¿ããŠããŸãã |
| Wan2.1-Fun-14B-Control | 47.0 GB | ð€Link | ðLink | Wan2.1-Fun-14Bã®ãããªå¶åŸ¡ãŠã§ã€ããCannyãDepthãPoseãMLSDãªã©ã®ç°ãªãå¶åŸ¡æ¡ä»¶ããµããŒããããã©ãžã§ã¯ããªå¶åŸ¡ãå©çšå¯èœã512ã768ã1024ã®ãã«ãè§£å床ã§ã®ãããªäºæž¬ããµããŒããã81ãã¬ãŒã ïŒ1ç§éã«16ãã¬ãŒã ïŒã§ãã¬ãŒãã³ã°æžã¿ã§ãå€èšèªäºæž¬ã«ã察å¿ããŠããŸãã |
4. Wan2.1
| åç§° | Hugging Face | Model Scope | 説æ |
|---|---|---|---|
| Wan2.1-T2V-1.3B | ð€Link | ðLink | äžè±¡2.1-1.3Bã®ããã¹ãããåç»çæããéã¿ |
| Wan2.1-T2V-14B | ð€Link | ðLink | äžè±¡2.1-14Bã®ããã¹ãããåç»çæããéã¿ |
| Wan2.1-I2V-14B-480P | ð€Link | ðLink | äžè±¡2.1-14B-480Pã®ç»åããåç»çæããéã¿ |
| Wan2.1-I2V-14B-720P | ð€Link | ðLink | äžè±¡2.1-14B-720Pã®ç»åããåç»çæããéã¿ |
5. FantasyTalking
| åç§° | ã¹ãã¬ãŒãž | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| Wan2.1-I2V-14B-720P | - | ð€Link | ðLink | äžè±¡2.1-14B-720P ç»åâåç»ã¢ãã«ã®éã¿ |
| Wav2Vec | - | ð€Link | ðLink | Wav2Vecã¢ãã«ãWan2.1-I2V-14B-720Pãã©ã«ãå
ã«é
眮ããaudio_encoder ãšããååã«å€æŽããŠãã ãã |
| FantasyTalking model | - | ð€Link | ðLink | å ¬åŒAudio Conditionéã¿ |
6. Qwen-Image
| åç§° | ã¹ãã¬ãŒãž | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| Qwen-Image | ð€Link | ðLink | Qwen-Image å ¬åŒéã¿ | |
| Qwen-Image-Edit | ð€Link | ðLink | Qwen-Image-Edit å ¬åŒéã¿ | |
| Qwen-Image-Edit-2509 | ð€Link | ðLink | Qwen-Image-Edit-2509 å ¬åŒéã¿ |
7. Z-Image
| åç§° | ã¹ãã¬ãŒãž | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| Z-Image-Turbo | ð€ãªã³ã¯ | ðãªã³ã¯ | Z-Image-Turboã®å ¬åŒéã¿ |
8. Z-Image-Fun
| åç§° | ã¹ãã¬ãŒãž | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| Z-Image-Turbo-Fun-Controlnet-Union | - | ð€ãªã³ã¯ | ðãªã³ã¯ | Z-Image-Turboçšã®ControlNetéã¿ãCannyãDepthãPoseãMLSDãªã©è€æ°ã®å¶åŸ¡æ¡ä»¶ããµããŒãã |
9. Flux
| åç§° | ã¹ãã¬ãŒãž | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| FLUX.1-dev | ð€Link | ðLink | FLUX.1-dev å ¬åŒéã¿ | |
| FLUX.2-dev | ð€Link | ðLink | FLUX.2-dev å ¬åŒéã¿ |
10. Flux-Fun
| åå | ã¹ãã¬ãŒãž | Hugging Face | ModelScope | 説æ |
|---|---|---|---|---|
| Flux.2-dev-Fun-Controlnet-Union | - | ð€ãªã³ã¯ | ðãªã³ã¯ | Flux.2-dev çšã® ControlNet éã¿ã§ãCannyãDepthãPoseãMLSD ãªã©æ§ã ãªå¶åŸ¡æ¡ä»¶ããµããŒãããŸãã |
11. HunyuanVideo
| åç§° | ã¹ãã¬ãŒãž | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| HunyuanVideo | ð€Link | - | HunyuanVideo-diffusers å ¬åŒéã¿ | |
| HunyuanVideo-I2V | ð€Link | - | HunyuanVideo-I2V-diffusers å ¬åŒéã¿ |
12. CogVideoX-Fun
V1.5:
| åç§° | ã¹ãã¬ãŒãžã¹ããŒã¹ | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| CogVideoX-Fun-V1.5-5b-InP | 20.0 GB | ð€Link | ðLink | å ¬åŒã®ã°ã©ãçæãããªã¢ãã«ã¯ãè€æ°ã®è§£å床ïŒ512ã768ã1024ïŒã§ãããªãäºæž¬ã§ããŸãã85ãã¬ãŒã ã8ãã¬ãŒã /ç§ã§ãã¬ãŒãã³ã°ãããŠããŸãã |
| CogVideoX-Fun-V1.5-Reward-LoRAs | - | ð€ãªã³ã¯ | ðãªã³ã¯ | å ¬åŒã®å ±é ¬éäŒææè¡ã¢ãã«ã§ãCogVideoX-Fun-V1.5ãçæãããããªãæé©åãã人éã®å奜ã«ããããåãããã«ããã |
V1.1:
| åç§° | ã¹ãã¬ãŒãžã¹ããŒã¹ | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| CogVideoX-Fun-V1.1-2b-InP | 13.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | å ¬åŒã®ã°ã©ãçæãããªã¢ãã«ã¯ãè€æ°ã®è§£å床ïŒ512ã768ã1024ã1280ïŒã§ãããªãäºæž¬ã§ããŸãã49ãã¬ãŒã ã8ãã¬ãŒã /ç§ã§ãã¬ãŒãã³ã°ãããŠããŸããåç §ç»åã«ãã€ãºã远å ãããV1.0ãšæ¯èŒããŠåãã®å¹ ãåºãã£ãŠããŸãã |
| CogVideoX-Fun-V1.1-5b-InP | 20.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | å ¬åŒã®ã°ã©ãçæãããªã¢ãã«ã¯ãè€æ°ã®è§£å床ïŒ512ã768ã1024ã1280ïŒã§ãããªãäºæž¬ã§ããŸãã49ãã¬ãŒã ã8ãã¬ãŒã /ç§ã§ãã¬ãŒãã³ã°ãããŠããŸããåç §ç»åã«ãã€ãºã远å ãããV1.0ãšæ¯èŒããŠåãã®å¹ ãåºãã£ãŠããŸãã |
| CogVideoX-Fun-V1.1-2b-Pose | 13.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | å ¬åŒã®ããŒãºã³ã³ãããŒã«ãããªã¢ãã«ã¯ãè€æ°ã®è§£å床ïŒ512ã768ã1024ã1280ïŒã§ãããªãäºæž¬ã§ããŸãã49ãã¬ãŒã ã8ãã¬ãŒã /ç§ã§ãã¬ãŒãã³ã°ãããŠããŸãã |
| CogVideoX-Fun-V1.1-2b-Control | 13.0 GB | ð€Link | ðLink | å ¬åŒã®ã³ã³ãããŒã«ãããªã¢ãã«ã¯ãè€æ°ã®è§£å床ïŒ512ã768ã1024ã1280ïŒã§ãããªãäºæž¬ã§ããŸãã49ãã¬ãŒã ã8ãã¬ãŒã /ç§ã§ãã¬ãŒãã³ã°ãããŠããŸããCannyãDepthãPoseãMLSDãªã©ã®ããŸããŸãªã³ã³ãããŒã«æ¡ä»¶ããµããŒãããŸãã |
| CogVideoX-Fun-V1.1-5b-Pose | 20.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | å ¬åŒã®ããŒãºã³ã³ãããŒã«ãããªã¢ãã«ã¯ãè€æ°ã®è§£å床ïŒ512ã768ã1024ã1280ïŒã§ãããªãäºæž¬ã§ããŸãã49ãã¬ãŒã ã8ãã¬ãŒã /ç§ã§ãã¬ãŒãã³ã°ãããŠããŸãã |
| CogVideoX-Fun-V1.1-5b-Control | 20.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | å ¬åŒã®ã³ã³ãããŒã«ãããªã¢ãã«ã¯ãè€æ°ã®è§£å床ïŒ512ã768ã1024ã1280ïŒã§ãããªãäºæž¬ã§ããŸãã49ãã¬ãŒã ã8ãã¬ãŒã /ç§ã§ãã¬ãŒãã³ã°ãããŠããŸããCannyãDepthãPoseãMLSDãªã©ã®ããŸããŸãªã³ã³ãããŒã«æ¡ä»¶ããµããŒãããŸãã |
| CogVideoX-Fun-V1.1-Reward-LoRAs | - | ð€ãªã³ã¯ | ðãªã³ã¯ | å ¬åŒã®å ±é ¬éäŒææè¡ã¢ãã«ã§ãCogVideoX-Fun-V1.1ãçæãããããªãæé©åãã人éã®å奜ã«ããããåãããã«ããã |
(Obsolete) V1.0:
| åç§° | ã¹ãã¬ãŒãžã¹ããŒã¹ | Hugging Face | Model Scope | 説æ |
|---|---|---|---|---|
| CogVideoX-Fun-2b-InP | 13.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | å ¬åŒã®ã°ã©ãçæãããªã¢ãã«ã¯ãè€æ°ã®è§£å床ïŒ512ã768ã1024ã1280ïŒã§ãããªãäºæž¬ã§ããŸãã49ãã¬ãŒã ã8ãã¬ãŒã /ç§ã§ãã¬ãŒãã³ã°ãããŠããŸãã |
| CogVideoX-Fun-5b-InP | 20.0 GB | ð€ãªã³ã¯ | ðãªã³ã¯ | å ¬åŒã®ã°ã©ãçæãããªã¢ãã«ã¯ãè€æ°ã®è§£å床ïŒ512ã768ã1024ã1280ïŒã§ãããªãäºæž¬ã§ããŸãã49ãã¬ãŒã ã8ãã¬ãŒã /ç§ã§ãã¬ãŒãã³ã°ãããŠããŸãã |
åèæç®
- CogVideo: https://github.com/THUDM/CogVideo/
- EasyAnimate: https://github.com/aigc-apps/EasyAnimate
- Wan2.1: https://github.com/Wan-Video/Wan2.1/
- Wan2.2: https://github.com/Wan-Video/Wan2.2/
- Diffusers: https://github.com/huggingface/diffusers
- Qwen-Image: https://github.com/QwenLM/Qwen-Image
- Self-Forcing: https://github.com/guandeh17/Self-Forcing
- Flux: https://github.com/black-forest-labs/flux
- Flux2: https://github.com/black-forest-labs/flux2
- HunyuanVideo: https://github.com/Tencent-Hunyuan/HunyuanVideo
- ComfyUI-KJNodes: https://github.com/kijai/ComfyUI-KJNodes
- ComfyUI-EasyAnimateWrapper: https://github.com/kijai/ComfyUI-EasyAnimateWrapper
- ComfyUI-CameraCtrl-Wrapper: https://github.com/chaojie/ComfyUI-CameraCtrl-Wrapper
- CameraCtrl: https://github.com/hehao13/CameraCtrl
ã©ã€ã»ã³ã¹
ãã®ãããžã§ã¯ãã¯Apache License (Version 2.0)ã®äžã§ã©ã€ã»ã³ã¹ãããŠããŸãã
CogVideoX-2Bã¢ãã«ïŒå¯Ÿå¿ããTransformersã¢ãžã¥ãŒã«ãVAEã¢ãžã¥ãŒã«ãå«ãïŒã¯ãApache 2.0ã©ã€ã»ã³ã¹ã®äžã§ãªãªãŒã¹ãããŠããŸãã
CogVideoX-5Bã¢ãã«ïŒTransformersã¢ãžã¥ãŒã«ïŒã¯ãCogVideoXã©ã€ã»ã³ã¹ã®äžã§ãªãªãŒã¹ãããŠããŸãã
