|
|
--- |
|
|
title: HY-Motion-1.0 |
|
|
emoji: π |
|
|
colorFrom: purple |
|
|
colorTo: red |
|
|
sdk: gradio |
|
|
sdk_version: 4.44.0 |
|
|
app_file: gradio_app.py |
|
|
pinned: false |
|
|
short_description: Text-to-3D and Image-to-3D Generation |
|
|
--- |
|
|
|
|
|
|
|
|
<p align="center"> |
|
|
<img src="./assets/banner.png" alt="Banner" width="100%"> |
|
|
</p> |
|
|
|
|
|
<div align="center"> |
|
|
<a href="https://hunyuan.tencent.com/motion" target="_blank"> |
|
|
<img src="https://img.shields.io/badge/Official%20Site-333399.svg?logo=homepage" height="22px" alt="Official Site"> |
|
|
</a> |
|
|
<a href="https://huggingface.co/spaces/tencent/HY-Motion-1.0" target="_blank"> |
|
|
<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Demo-276cb4.svg" height="22px" alt="HuggingFace Space"> |
|
|
</a> |
|
|
<a href="https://huggingface.co/tencent/HY-Motion-1.0" target="_blank"> |
|
|
<img src="https://img.shields.io/badge/%F0%9F%A4%97%20Models-d96902.svg" height="22px" alt="HuggingFace Models"> |
|
|
</a> |
|
|
<a href="https://arxiv.org/pdf/2512.23464" target="_blank"> |
|
|
<img src="https://img.shields.io/badge/Report-b5212f.svg?logo=arxiv" height="22px" alt="ArXiv Report"> |
|
|
</a> |
|
|
<a href="https://x.com/TencentHunyuan" target="_blank"> |
|
|
<img src="https://img.shields.io/badge/Hunyuan-black.svg?logo=x" height="22px" alt="X (Twitter)"> |
|
|
</a> |
|
|
</div> |
|
|
|
|
|
|
|
|
# HY-Motion 1.0: Scaling Flow Matching Models for 3D Motion Generation |
|
|
|
|
|
|
|
|
<p align="center"> |
|
|
<img src="./assets/teaser.png" alt="Teaser" width="90%"> |
|
|
</p> |
|
|
|
|
|
|
|
|
## π₯ News |
|
|
- **Dec 30, 2025**: π€ We released the inference code and pretrained models of [HY-Motion 1.0](https://huggingface.co/tencent/HY-Motion-1.0). Please give it a try via our [HuggingFace Space](https://huggingface.co/spaces/tencent/HY-Motion-1.0) and our [Official Site](https://hunyuan.tencent.com/motion)! |
|
|
|
|
|
|
|
|
## **Introduction** |
|
|
|
|
|
**HY-Motion 1.0** is a series of text-to-3D human motion generation models based on Diffusion Transformer (DiT) and Flow Matching. It allows developers to generate skeleton-based 3D character animations from simple text prompts, which can be directly integrated into various 3D animation pipelines. This model series is the first to scale DiT-based text-to-motion models to the billion-parameter level, achieving significant improvements in instruction-following capabilities and motion quality over existing open-source models. |
|
|
|
|
|
### Key Features |
|
|
- **State-of-the-Art Performance**: Achieves state-of-the-art performance in both instruction-following capability and generated motion quality. |
|
|
|
|
|
- **Billion-Scale Models**: We are the first to successfully scale DiT-based models to the billion-parameter level for text-to-motion generation. This results in superior instruction understanding and following capabilities, outperforming comparable open-source models. |
|
|
|
|
|
- **Advanced Three-Stage Training**: Our models are trained using a comprehensive three-stage process: |
|
|
|
|
|
- *Large-Scale Pre-training*: Trained on over 3,000 hours of diverse motion data to learn a broad motion prior. |
|
|
|
|
|
- *High-Quality Fine-tuning*: Fine-tuned on 400 hours of curated, high-quality 3D motion data to enhance motion detail and smoothness. |
|
|
|
|
|
- *Reinforcement Learning*: Utilizes Reinforcement Learning from human feedback and reward models to further refine instruction-following and motion naturalness. |
|
|
|
|
|
|
|
|
|
|
|
<p align="center"> |
|
|
<img src="./assets/pipeline.png" alt="System Overview" width="100%"> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<img src="./assets/arch.png" alt="Architecture" width="100%"> |
|
|
</p> |
|
|
|
|
|
<p align="center"> |
|
|
<img src="./assets/sotacomp.png" alt="ComparisonSoTA" width="100%"> |
|
|
</p> |
|
|
|
|
|
|
|
|
## π BibTeX |
|
|
|
|
|
If you found this repository helpful, please cite our reports: |
|
|
|
|
|
```bibtex |
|
|
@article{hymotion2025, |
|
|
title={HY-Motion 1.0: Scaling Flow Matching Models for Text-To-Motion Generation}, |
|
|
author={Tencent Hunyuan 3D Digital Human Team}, |
|
|
journal={arXiv preprint arXiv:2512.23464}, |
|
|
year={2025} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Acknowledgements |
|
|
|
|
|
We would like to thank the contributors to the [FLUX](https://github.com/black-forest-labs/flux), [diffusers](https://github.com/huggingface/diffusers), [HuggingFace](https://huggingface.co), [SMPL](https://smpl.is.tue.mpg.de/)/[SMPLH](https://mano.is.tue.mpg.de/), [CLIP](https://github.com/openai/CLIP), [Qwen3](https://github.com/QwenLM/Qwen3), [PyTorch3D](https://github.com/facebookresearch/pytorch3d), [kornia](https://github.com/kornia/kornia), [transforms3d](https://github.com/matthew-brett/transforms3d), [FBX-SDK](https://www.autodesk.com/developer-network/platform-technologies/fbx-sdk-2020-0), [GVHMR](https://zju3dv.github.io/gvhmr/), and [HunyuanVideo](https://github.com/Tencent-Hunyuan/HunyuanVideo) repositories or tools, for their open research and exploration. |
|
|
|