Diffusers
Safetensors
File size: 3,103 Bytes
caa51e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.5.1/css/all.min.css" integrity="sha512-DTOQO9RWCH3ppGqcWaEA1BIZOC6xxalwEsw9c2QQeAIftl+Vegovlnee1c9QX4TctnWMn13TZye+giMm8e2LwA==" crossorigin="anonymous" referrerpolicy="no-referrer" />

<h1 align="center">CoMoVi: Co-Generation of 3D Human Motions<br>and Realistic Videos</h1>

<p align="center">
  <a href="https://afterjourney00.github.io/" target="_blank">Chengfeng Zhao</a><sup>1</sup>,
  <a href="https://github.com/Samir1110" target="_blank">Jiazhi Shu</a><sup>2</sup>,
  <a href="https://knoxzhao.github.io/" target="_blank">Yubo Zhao</a><sup>1</sup>,
  <a href="https://scholar.google.com/citations?hl=en&user=nhbSplwAAAAJ" target="_blank">Tianyu Huang</a><sup>3</sup>,
  <a href="https://scholar.google.com/citations?hl=en&user=nhbSplwAAAAJ" target="_blank">Jiahao Lu</a><sup>1</sup>,
  <br>
  <a href="https://scholar.google.com/citations?hl=en&user=nhbSplwAAAAJ" target="_blank">Zekai Gu</a><sup>1</sup>,
  <a href="https://scholar.google.com/citations?hl=en&user=nhbSplwAAAAJ" target="_blank">Chengwei Ren</a><sup>1</sup>,
  <a href="https://frank-zy-dou.github.io/" target="_blank">Zhiyang Dou</a><sup>4</sup>,
  <a href="https://chingswy.github.io/" target="_blank">Qing Shuai</a><sup>5</sup>,
  <a href="https://liuyuan-pal.github.io/" target="_blank">Yuan Liu</a><sup>1 <i class="far fa-envelope"></i></sup>
</p>
<p align="center">
  <sup>1</sup>HKUST &nbsp;&nbsp;
  <sup>2</sup>SCUT &nbsp;&nbsp;
  <sup>3</sup>CUHK &nbsp;&nbsp;
  <sup>4</sup>MIT &nbsp;&nbsp;
  <sup>5</sup>ZJU &nbsp;&nbsp;
  <br>
  <i><sup><i class="far fa-envelope"></i></sup> Corresponding author</i>
</p>
<p align="center">
  <a href="https://igl-hkust.github.io/CoMoVi/"><img src='https://img.shields.io/badge/arXiv-Paper-red?logo=arxiv&logoColor=white' alt='arXiv'></a>
  <a href='https://igl-hkust.github.io/CoMoVi/'><img src='https://img.shields.io/badge/Project_Page-Website-green?logo=googlechrome&logoColor=white' alt='Project Page'></a>
  <a href='https://huggingface.co/datasets/AfterJourney/CoMoVi-50K'><img src='https://img.shields.io/badge/Hugging%20Face-Dataset-yellow?logo=huggingface' alt='Dataset'></a>
</p>

<div align="center">
  <img width="900px" src="./assets/teaser.png"/>
</div>

## <i class="fa-brands fa-github"></i> [GitHub](https://github.com/IGL-HKUST/CoMoVi)

## Acknowledgments

Thanks to the following work that we refer to and benefit from:
- [VideoX-Fun](https://github.com/aigc-apps/VideoX-Fun): the video generation model training framework;
- [CameraHMR](https://github.com/pixelite1201/CameraHMR/): the excellent SMPL estimation for pseudo labels;
- [Champ](https://github.com/fudan-generative-vision/champ): the data processing pipeline

## Citation

```bibtex
@article{zhao2026comovi,
  title={CoMoVi: Co-Generation of 3D Human Motions and Realistic Videos},
  author={Zhao, Chengfeng and Shu, Jiazhi and Zhao, Yubo and Huang, Tianyu and Lu, Jiahao and Gu, Zekai and Ren, Chengwei and Dou, Zhiyang and Shuai, Qing and Liu, Yuan},
  journal={arXiv preprint arXiv:2601.10632},
  year={2026}
}
```