xg-chu
/

ARTalk

Model card Files Files and versions

xet

Community

xg-chu commited on Mar 20, 2025

Commit

6fbead6

verified ·

1 Parent(s): 5fbbe84

Update README.md

Browse files

Files changed (1) hide show

README.md +135 -3

README.md CHANGED Viewed

@@ -1,3 +1,135 @@
----
-license: mit
----

+---
+license: mit
+---
+<h1 align="center">
+ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model
+</h1>
+<h5 align="center">
+    <a href='https://arxiv.org/abs/2502.20323'>Paper</a>&emsp;
+    <a href='https://xg-chu.site/project_artalk/'>Project Website</a>&emsp;
+    <a href='https://github.com/xg-chu/ARTalk/'>Code</a>&emsp;
+</h5>
+<h5 align="center">
+    <a href="https://xg-chu.site">Xuangeng Chu</a><sup>1</sup>&emsp;
+    <a href="https://naba89.github.io">Nabarun Goswami</a><sup>1</sup>,</span>&emsp;
+    <a href="https://cuiziteng.github.io">Ziteng Cui</a><sup>1</sup>,</span>&emsp;
+    <a href="https://openreview.net/profile?id=~Hanqin_Wang1">Hanqin Wang</a><sup>1</sup>,</span>&emsp;
+    <a href="https://www.mi.t.u-tokyo.ac.jp/harada/">Tatsuya Harada</a><sup>1,2</sup>
+    <br>
+    <sup>1</sup>The University of Tokyo,
+    <sup>2</sup>RIKEN AIP
+</h5>
+<div align="center">
+    <!-- <div align="center">
+        <b><img src="./demos/teaser.gif" alt="drawing" width="960"/></b>
+    </div> -->
+    <b>
+        ARTalk generates realistic 3D head motions (lip sync, blinking, expressions, head poses) from audio.
+    </b>
+    <br>
+        🔥 More results can be found in our <a href="https://xg-chu.site/project_artalk/">Project Page</a>. 🔥
+</div>
+<!-- ## TO DO
+We are now preparing the <b>pre-trained model and quick start materials</b> and will release it within a week. -->
+## Installation
+### Clone the project
+```
+git clone --recurse-submodules git@github.com:xg-chu/ARTalk.git
+cd ARTalk
+```
+### Build environment
+I will prepare a new environment guide as soon as possible.
+For now, please use GAGAvatar's `environment.yml` and install gradio and other dependent libraries.
+```
+conda env create -f environment.yml
+conda activate ARTalk
+```
+<details>
+<summary><span>Install GAGAvatar Module (If you want to use realistic avatars)</span></summary>
+```
+git clone --recurse-submodules git@github.com:xg-chu/diff-gaussian-rasterization.git
+pip install ./diff-gaussian-rasterization
+rm -rf ./diff-gaussian-rasterization
+```
+</details>
+### Prepare resources
+Prepare resources with:
+```
+bash ./build_resources.sh
+```
+## Quick Start Guide
+### Using <a href="https://github.com/gradio-app/gradio">Gradio</a> Interface
+We provide a simple Gradio demo to demonstrate ARTalk's capabilities:
+```
+python inference.py --run_app
+```
+### Command Line Usage
+ARTalk can be used via command line:
+```
+python inference.py -a your_audio_path --shape_id your_apperance --style_id your_style_motion --clip_length 750
+```
+`--shape_id` can be specified with `mesh` or tracked real avatars stored in `tracked.pt`.
+`--style_id` can be specified with the name of `*.pt` stored in `assets/style_motion`.
+`--clip_length` sets the maximum duration of the rendered video and can be adjusted as needed. Longer videos may take more time to render.
+<details>
+<summary><span>Track new real head avatar and new style motion</span></summary>
+The file `tracked.pt` is generated using <a href="https://github.com/xg-chu/GAGAvatar/blob/main/inference.py">`GAGAvatar/inference.py`</a>. Here I've included several examples of tracked avatars for quick testing.
+The style motion is tracked with EMICA module in <a href="https://github.com/xg-chu/GAGAvatar_track">`GAGAvatar_track` </a>. Each contains `50*106` dimensional data. `50` is 2 seconds consecutive frames, `106` is `100` expression code and `6` pose code (base+jaw). Here I've included several examples of tracked style motion.
+</details>
+## Training
+This version modifies the VQVAE part compared to the paper version.
+The training code and the paper version code are still in preparation and are expected to be released later.
+## Acknowledgements
+We thank <a href="https://www.linkedin.com/in/lars-traaholt-vågnes-432725130/">Lars Traaholt Vågnes</a> and <a href="https://emmanueliarussi.github.io">Emmanuel Iarussi</a> from <a href="https://www.simli.com">Simli</a> for the insightful discussions! 🤗
+The ARTalk logo was designed by Caihong Ning.
+Some part of our work is built based on FLAME.
+We also thank the following projects for sharing their great work.
+- **GAGAvatar**: https://github.com/xg-chu/GAGAvatar
+- **GPAvatar**: https://github.com/xg-chu/GPAvatar
+- **FLAME**: https://flame.is.tue.mpg.de
+- **EMICA**: https://github.com/radekd91/inferno
+## Citation
+If you find our work useful in your research, please consider citing:
+```bibtex
+@misc{
+    chu2025artalk,
+    title={ARTalk: Speech-Driven 3D Head Animation via Autoregressive Model},
+    author={Xuangeng Chu and Nabarun Goswami and Ziteng Cui and Hanqin Wang and Tatsuya Harada},
+    year={2025},
+    eprint={2502.20323},
+    archivePrefix={arXiv},
+    primaryClass={cs.CV},
+    url={https://arxiv.org/abs/2502.20323},
+}
+```