GEM: A Generalist Model for Human Motion

Jiefeng Li · Jinkun Cao · Haotian Zhang · Davis Rempe · Jan Kautz · Umar Iqbal · Ye Yuan

ICCV 2025 (Highlight)

Logo

**GEM** is a generalist model for human motion that handles multiple tasks with a single model, supporting diverse conditioning signals including video, keypoints, text, audio, and 3D keyframes. --- ## 📰 News - **[December 2025]** 📢 GENMO has been renamed to **GEM**. - **[October 2025]** 📢 The **GEM** codebase is **released!** Stay tuned for the pretrained models and evaluation scripts. Follow the [project page](https://research.nvidia.com/labs/dair/gem/) for updates and announcements. --- ## 🚀 Highlights GEM introduces a **unified generative framework** that connects motion estimation and generation through shared objectives. - **Unified framework:** Reframes motion estimation as *constrained generation*, allowing a single model to perform both tasks. - **Regression × Diffusion synergy:** Combines the accuracy of regression models with the diversity of diffusion-based generation. - **Estimation-guided training:** Trains effectively on in-the-wild datasets using only 2D or textual supervision. - **Multimodal conditioning:** Supports video, text, audio, 2D/3D keyframes, or even time-varying mixed inputs (e.g., video → text → video). - **Arbitrary-length motion:** Generates continuous, coherent sequences of any duration in one diffusion pass. - **State-of-the-art performance:** Achieves leading results on diverse motion estimation and generation benchmarks. For more details, visit the **[GEM project page →](https://research.nvidia.com/labs/dair/gem/)** --- ### Pretrained Models You can download pretrained models from [Google Drive](https://drive.google.com/file/d/1b1E84G7S0h2n5o0RmrcmKOhRKukOjgsJ/view?usp=sharing). ## 📖 Paper & Citation **Paper:** [GENMO: A GENeralist Model for Human MOtion](https://arxiv.org/abs/2505.01425) *Jiefeng Li, Jinkun Cao, Haotian Zhang, Davis Rempe, Jan Kautz, Umar Iqbal, Ye Yuan* ICCV, 2025 **BibTeX:** ```bibtex @inproceedings{genmo2025, title = {GENMO: A GENeralist Model for Human MOtion}, author = {Li, Jiefeng and Cao, Jinkun and Zhang, Haotian and Rempe, Davis and Kautz, Jan and Iqbal, Umar and Yuan, Ye}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)}, year = {2025} }