xg-chu commited on
Commit
6fbead6
·
verified ·
1 Parent(s): 5fbbe84

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +135 -3
README.md CHANGED
@@ -1,3 +1,135 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+ <h1 align="center">
5
+ ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model
6
+ </h1>
7
+
8
+ <h5 align="center">
9
+ <a href='https://arxiv.org/abs/2502.20323'>Paper</a>&emsp;
10
+ <a href='https://xg-chu.site/project_artalk/'>Project Website</a>&emsp;
11
+ <a href='https://github.com/xg-chu/ARTalk/'>Code</a>&emsp;
12
+ </h5>
13
+
14
+ <h5 align="center">
15
+ <a href="https://xg-chu.site">Xuangeng Chu</a><sup>1</sup>&emsp;
16
+ <a href="https://naba89.github.io">Nabarun Goswami</a><sup>1</sup>,</span>&emsp;
17
+ <a href="https://cuiziteng.github.io">Ziteng Cui</a><sup>1</sup>,</span>&emsp;
18
+ <a href="https://openreview.net/profile?id=~Hanqin_Wang1">Hanqin Wang</a><sup>1</sup>,</span>&emsp;
19
+ <a href="https://www.mi.t.u-tokyo.ac.jp/harada/">Tatsuya Harada</a><sup>1,2</sup>
20
+ <br>
21
+ <sup>1</sup>The University of Tokyo,
22
+ <sup>2</sup>RIKEN AIP
23
+ </h5>
24
+
25
+
26
+ <div align="center">
27
+ <!-- <div align="center">
28
+ <b><img src="./demos/teaser.gif" alt="drawing" width="960"/></b>
29
+ </div> -->
30
+ <b>
31
+ ARTalk generates realistic 3D head motions (lip sync, blinking, expressions, head poses) from audio.
32
+ </b>
33
+ <br>
34
+ 🔥 More results can be found in our <a href="https://xg-chu.site/project_artalk/">Project Page</a>. 🔥
35
+ </div>
36
+
37
+ <!-- ## TO DO
38
+ We are now preparing the <b>pre-trained model and quick start materials</b> and will release it within a week. -->
39
+
40
+ ## Installation
41
+ ### Clone the project
42
+ ```
43
+ git clone --recurse-submodules git@github.com:xg-chu/ARTalk.git
44
+ cd ARTalk
45
+ ```
46
+
47
+ ### Build environment
48
+ I will prepare a new environment guide as soon as possible.
49
+
50
+ For now, please use GAGAvatar's `environment.yml` and install gradio and other dependent libraries.
51
+ ```
52
+ conda env create -f environment.yml
53
+ conda activate ARTalk
54
+ ```
55
+
56
+ <details>
57
+ <summary><span>Install GAGAvatar Module (If you want to use realistic avatars)</span></summary>
58
+
59
+ ```
60
+ git clone --recurse-submodules git@github.com:xg-chu/diff-gaussian-rasterization.git
61
+ pip install ./diff-gaussian-rasterization
62
+ rm -rf ./diff-gaussian-rasterization
63
+ ```
64
+
65
+ </details>
66
+
67
+ ### Prepare resources
68
+ Prepare resources with:
69
+ ```
70
+ bash ./build_resources.sh
71
+ ```
72
+
73
+ ## Quick Start Guide
74
+ ### Using <a href="https://github.com/gradio-app/gradio">Gradio</a> Interface
75
+
76
+ We provide a simple Gradio demo to demonstrate ARTalk's capabilities:
77
+ ```
78
+ python inference.py --run_app
79
+ ```
80
+
81
+ ### Command Line Usage
82
+
83
+ ARTalk can be used via command line:
84
+ ```
85
+ python inference.py -a your_audio_path --shape_id your_apperance --style_id your_style_motion --clip_length 750
86
+ ```
87
+ `--shape_id` can be specified with `mesh` or tracked real avatars stored in `tracked.pt`.
88
+
89
+ `--style_id` can be specified with the name of `*.pt` stored in `assets/style_motion`.
90
+
91
+ `--clip_length` sets the maximum duration of the rendered video and can be adjusted as needed. Longer videos may take more time to render.
92
+
93
+ <details>
94
+ <summary><span>Track new real head avatar and new style motion</span></summary>
95
+
96
+ The file `tracked.pt` is generated using <a href="https://github.com/xg-chu/GAGAvatar/blob/main/inference.py">`GAGAvatar/inference.py`</a>. Here I've included several examples of tracked avatars for quick testing.
97
+
98
+ The style motion is tracked with EMICA module in <a href="https://github.com/xg-chu/GAGAvatar_track">`GAGAvatar_track` </a>. Each contains `50*106` dimensional data. `50` is 2 seconds consecutive frames, `106` is `100` expression code and `6` pose code (base+jaw). Here I've included several examples of tracked style motion.
99
+ </details>
100
+
101
+ ## Training
102
+
103
+ This version modifies the VQVAE part compared to the paper version.
104
+
105
+ The training code and the paper version code are still in preparation and are expected to be released later.
106
+
107
+
108
+ ## Acknowledgements
109
+
110
+ We thank <a href="https://www.linkedin.com/in/lars-traaholt-vågnes-432725130/">Lars Traaholt Vågnes</a> and <a href="https://emmanueliarussi.github.io">Emmanuel Iarussi</a> from <a href="https://www.simli.com">Simli</a> for the insightful discussions! 🤗
111
+
112
+ The ARTalk logo was designed by Caihong Ning.
113
+
114
+ Some part of our work is built based on FLAME.
115
+ We also thank the following projects for sharing their great work.
116
+ - **GAGAvatar**: https://github.com/xg-chu/GAGAvatar
117
+ - **GPAvatar**: https://github.com/xg-chu/GPAvatar
118
+ - **FLAME**: https://flame.is.tue.mpg.de
119
+ - **EMICA**: https://github.com/radekd91/inferno
120
+
121
+
122
+ ## Citation
123
+ If you find our work useful in your research, please consider citing:
124
+ ```bibtex
125
+ @misc{
126
+ chu2025artalk,
127
+ title={ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model},
128
+ author={Xuangeng Chu and Nabarun Goswami and Ziteng Cui and Hanqin Wang and Tatsuya Harada},
129
+ year={2025},
130
+ eprint={2502.20323},
131
+ archivePrefix={arXiv},
132
+ primaryClass={cs.CV},
133
+ url={https://arxiv.org/abs/2502.20323},
134
+ }
135
+ ```