Diffusers
Safetensors
rangm commited on
Commit
65dc654
ยท
verified ยท
1 Parent(s): b1b527a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +150 -3
README.md CHANGED
@@ -1,3 +1,150 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ ---
4
+ <p align="center">
5
+ <img src="asset/EchoMimicV3_logo.png.jpg" height=60>
6
+ </p>
7
+
8
+ <h1 align='center'>EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation</h1>
9
+
10
+ <div align='center'>
11
+ <a href='https://github.com/mengrang' target='_blank'>Rang Meng</a><sup>1</sup>&emsp;
12
+ <a href='https://github.com/' target='_blank'>Yan Wang</a>&emsp;
13
+ <a href='https://github.com/' target='_blank'>Weipeng Wu</a>&emsp;
14
+ <a href='https://github.com/' target='_blank'>Ruobing Zheng</a>&emsp;
15
+ <a href='https://lymhust.github.io/' target='_blank'>Yuming Li</a><sup>2</sup>&emsp;
16
+ <a href='https://openreview.net/profile?id=~Chenguang_Ma3' target='_blank'>Chenguang Ma</a><sup>2</sup>
17
+ </div>
18
+ <div align='center'>
19
+ Terminal Technology Department, Alipay, Ant Group.
20
+ </div>
21
+ <p align='center'>
22
+ <sup>1</sup>Core Contributor&emsp;
23
+ <sup>2</sup>Corresponding Authors
24
+ </p>
25
+ <div align='center'>
26
+ <a href='https://antgroup.github.io/ai/echomimic_v3/'><img src='https://img.shields.io/badge/Project-Page-blue'></a>
27
+ <!-- <a href='https://huggingface.co/BadToBest/EchoMimicV3'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Model-yellow'></a> -->
28
+ <!--<a href='https://antgroup.github.io/ai/echomimic_v2/'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20HuggingFace-Demo-yellow'></a>-->
29
+ <!-- <a href='https://modelscope.cn/models/BadToBest/EchoMimicV3'><img src='https://img.shields.io/badge/ModelScope-Model-purple'></a> -->
30
+ <!--<a href='https://antgroup.github.io/ai/echomimic_v2/'><img src='https://img.shields.io/badge/ModelScope-Demo-purple'></a>-->
31
+ <a href='https://arxiv.org/abs/2507.03905'><img src='https://img.shields.io/badge/Paper-Arxiv-red'></a>
32
+ <!-- <a href='https://openaccess.thecvf.com/content/CVPR2025/papers/Meng_EchoMimicV2_Towards_Striking_Simplified_and_Semi-Body_Human_Animation_CVPR_2025_paper.pdf'><img src='https://img.shields.io/badge/Paper-CVPR2025-blue'></a> -->
33
+ <!-- <a href='https://github.com/antgroup/echomimic_v2/blob/main/assets/halfbody_demo/wechat_group.png'><img src='https://badges.aleen42.com/src/wechat.svg'></a> -->
34
+ </div>
35
+ <!-- <div align='center'>
36
+ <a href='https://github.com/antgroup/echomimic_v3/discussions/0'><img src='https://img.shields.io/badge/English-Common Problems-orange'></a>
37
+ <a href='https://github.com/antgroup/echomimic_v3/discussions/1'><img src='https://img.shields.io/badge/ไธญๆ–‡็‰ˆ-ๅธธ่ง้—ฎ้ข˜ๆฑ‡ๆ€ป-orange'></a>
38
+ </div> -->
39
+
40
+ ## &#x1F680; EchoMimic Series
41
+ * EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation. [GitHub](https://github.com/antgroup/echomimic_v3)
42
+ * EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation. [GitHub](https://github.com/antgroup/echomimic_v2)
43
+ * EchoMimicV1: Lifelike Audio-Driven Portrait Animations through Editable Landmark Conditioning. [GitHub](https://github.com/antgroup/echomimic)
44
+
45
+
46
+ ## &#x1F4E3; Updates
47
+ <!-- * [2025.02.27] ๐Ÿ”ฅ EchoMimicV2 is accepted by CVPR 2025.
48
+ * [2025.01.16] ๐Ÿ”ฅ Please check out the [discussions](https://github.com/antgroup/echomimic_v2/discussions) to learn how to start EchoMimicV2.
49
+ * [2025.01.16] ๐Ÿš€๐Ÿ”ฅ [GradioUI for Accelerated EchoMimicV2](https://github.com/antgroup/echomimic_v2/blob/main/app_acc.py) is now available.
50
+ * [2025.01.03] ๐Ÿš€๐Ÿ”ฅ **One Minute is All You Need to Generate Video**. [Accelerated EchoMimicV2](https://github.com/antgroup/echomimic_v2/blob/main/infer_acc.py) are released. The inference speed can be improved by 9x (from ~7mins/120frames to ~50s/120frames on A100 GPU).
51
+ * [2024.12.16] ๐Ÿ”ฅ [RefImg-Pose Alignment Demo](https://github.com/antgroup/echomimic_v2/blob/main/demo.ipynb) is now available, which involves aligning reference image, extracting pose from driving video, and generating video.
52
+ * [2024.11.27] ๐Ÿ”ฅ [Installation tutorial](https://www.youtube.com/watch?v=2ab6U1-nVTQ) is now available. Thanks [AiMotionStudio](https://www.youtube.com/@AiMotionStudio) for the contribution.
53
+ * [2024.11.22] ๐Ÿ”ฅ [GradioUI](https://github.com/antgroup/echomimic_v2/blob/main/app.py) is now available. Thanks @gluttony-10 for the contribution.
54
+ * [2024.11.22] ๐Ÿ”ฅ [ComfyUI](https://github.com/smthemex/ComfyUI_EchoMimic) is now available. Thanks @smthemex for the contribution.
55
+ * [2024.11.21] ๐Ÿ”ฅ We release the EMTD dataset list and processing scripts.
56
+ * [2024.11.21] ๐Ÿ”ฅ We release our [EchoMimicV2](https://github.com/antgroup/echomimic_v2) codes and models. -->
57
+ <!-- * [2025.08.08] ๐Ÿ”ฅ We release our [codes](https://arxiv.org/abs/2507.03905). -->
58
+ * [2025.07.08] ๐Ÿ”ฅ Our [paper](https://arxiv.org/abs/2507.03905) is in public on arxiv.
59
+
60
+ ## &#x1F305; Gallery
61
+ <p align="center">
62
+ <img src="asset/echomimicv3.jpg" height=700>
63
+ </p>
64
+ <table class="center">
65
+ <tr>
66
+ <td width=100% style="border: none">
67
+ <video controls loop src="https://github.com/user-attachments/assets/f33edb30-66b1-484b-8be0-a5df20a44f3b" muted="false"></video>
68
+ </td>
69
+ </tr>
70
+ </table>
71
+ For more demo videos, please refer to the project page.
72
+
73
+ ## Quick Start
74
+ ### Environment Setup
75
+ - Tested System Environment: Centos 7.2/Ubuntu 22.04, Cuda >= 12.1
76
+ - Tested GPUs: A100(80G) / RTX4090D (24G) / V100(16G)
77
+ - Tested Python Version: 3.10 / 3.11
78
+
79
+ ### ๐Ÿ› ๏ธInstallation
80
+ #### 1. Create a conda environment and install pytorch, xformers
81
+ ```
82
+ conda create -n echomimic_v3 python=3.10
83
+ conda activate echomimic_v3
84
+ ```
85
+
86
+ #### 2. Other dependencies
87
+ ```
88
+ pip install -r requirements.txt
89
+ ```
90
+ ### ๐ŸงฑModel Preparation
91
+
92
+ | Models | Download Link | Notes |
93
+ | --------------|-------------------------------------------------------------------------------|-------------------------------|
94
+ | Wan2.1-FUN-1.3B | ๐Ÿค— [Huggingface](TBD) | Base model
95
+ | wav2vec2-base | ๐Ÿค— [Huggingface](TBD) | Audio encoder
96
+ | EchoMimicV3 | ๐Ÿค— [Huggingface](TBD) | Our weights
97
+
98
+ <!-- The **pretrained_weights** is organized as follows.
99
+
100
+ ```
101
+ ./models/
102
+ โ”œโ”€โ”€ denoising_unet.pth
103
+ โ”œโ”€โ”€ reference_unet.pth
104
+ โ”œโ”€โ”€ motion_module.pth
105
+ โ”œโ”€โ”€ pose_encoder.pth
106
+ โ”œโ”€โ”€ sd-vae-ft-mse
107
+ โ”‚ โ””โ”€โ”€ ...
108
+ โ””โ”€โ”€ audio_processor
109
+ โ””โ”€โ”€ tiny.pt
110
+ ``` -->
111
+ ### ๐Ÿ”‘ Quick Inference
112
+ ```
113
+ python infer.py
114
+ ```
115
+ > Tips
116
+ > - Audio CFG: Audio CFG works optimally between 2~3. Increase the audio CFG value for better lip synchronization, while decreasing the audio CFG value can improve the visual quality.
117
+ > - Text CFG: Text CFG works optimally between 4~6. Increase the text CFG value for better prompt following, while decreasing the text CFG value can improve the visual quality.
118
+ > - TeaCache: The optimal range for `--teacache_thresh` is between 0~0.1.
119
+ > - Sampling steps: 5 steps for talking head, 15~25 steps for talking body.
120
+ > - โ€‹Long video generation: If you want to generate a video longer than 138 frames, you can use Long Video CFG.
121
+
122
+
123
+ ## ๐Ÿ“ TODO List
124
+ | Status | Milestone |
125
+ |:--------:|:-------------------------------------------------------------------------|
126
+ | 2025.08.08 | The inference code of EchoMimicV3 meet everyone on GitHub |
127
+ | ๐Ÿš€ | Preview version Pretrained models trained on English and Chinese on HuggingFace |
128
+ | ๐Ÿš€ | Preview version Pretrained models trained on English and Chinese on ModelScope |
129
+ | ๐Ÿš€ | 720P Pretrained models trained on English and Chinese on HuggingFace |
130
+ | ๐Ÿš€ | 720P Pretrained models trained on English and Chinese on ModelScope |
131
+ | ๐Ÿš€ | The training code of EchoMimicV3 meet everyone on GitHub |
132
+
133
+
134
+
135
+ ## &#x1F4D2; Citation
136
+
137
+ If you find our work useful for your research, please consider citing the paper :
138
+
139
+ ```
140
+ @misc{meng2025echomimicv3,
141
+ title={EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation},
142
+ author={Rang Meng, Yan Wang, Weipeng Wu, Ruobing Zheng, Yuming Li, Chenguang Ma},
143
+ year={2025},
144
+ eprint={2507.03905},
145
+ archivePrefix={arXiv}
146
+ }
147
+ ```
148
+
149
+ ## &#x1F31F; Star History
150
+ [![Star History Chart](https://api.star-history.com/svg?repos=antgroup/echomimic_v3&type=Date)](https://star-history.com/#antgroup/echomimic_v3&Date)