GEM: A Generalist Model for Human Motion
Jiefeng Li
·
Jinkun Cao
·
Haotian Zhang
·
Davis Rempe
·
Jan Kautz
·
Umar Iqbal
·
Ye Yuan
ICCV 2025 (Highlight)
**GEM** is a generalist model for human motion that handles multiple tasks with a single model, supporting diverse conditioning signals including video, keypoints, text, audio, and 3D keyframes.
---
## 📰 News
- **[December 2025]** 📢 GENMO has been renamed to **GEM**.
- **[October 2025]** 📢 The **GEM** codebase is **released!**
Stay tuned for the pretrained models and evaluation scripts.
Follow the [project page](https://research.nvidia.com/labs/dair/gem/) for updates and announcements.
---
## 🚀 Highlights
GEM introduces a **unified generative framework** that connects motion estimation and generation through shared objectives.
- **Unified framework:** Reframes motion estimation as *constrained generation*, allowing a single model to perform both tasks.
- **Regression × Diffusion synergy:** Combines the accuracy of regression models with the diversity of diffusion-based generation.
- **Estimation-guided training:** Trains effectively on in-the-wild datasets using only 2D or textual supervision.
- **Multimodal conditioning:** Supports video, text, audio, 2D/3D keyframes, or even time-varying mixed inputs (e.g., video → text → video).
- **Arbitrary-length motion:** Generates continuous, coherent sequences of any duration in one diffusion pass.
- **State-of-the-art performance:** Achieves leading results on diverse motion estimation and generation benchmarks.
For more details, visit the **[GEM project page →](https://research.nvidia.com/labs/dair/gem/)**
---
### Pretrained Models
You can download pretrained models from [Google Drive](https://drive.google.com/file/d/1b1E84G7S0h2n5o0RmrcmKOhRKukOjgsJ/view?usp=sharing).
## 📖 Paper & Citation
**Paper:**
[GENMO: A GENeralist Model for Human MOtion](https://arxiv.org/abs/2505.01425)
*Jiefeng Li, Jinkun Cao, Haotian Zhang, Davis Rempe, Jan Kautz, Umar Iqbal, Ye Yuan*
ICCV, 2025
**BibTeX:**
```bibtex
@inproceedings{genmo2025,
title = {GENMO: A GENeralist Model for Human MOtion},
author = {Li, Jiefeng and Cao, Jinkun and Zhang, Haotian and Rempe, Davis and Kautz, Jan and Iqbal, Umar and Yuan, Ye},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
year = {2025}
}