|
|
--- |
|
|
license: cc-by-nc-sa-4.0 |
|
|
pipeline_tag: text-to-3d |
|
|
--- |
|
|
|
|
|
# <img src="https://huggingface.co/AIGeeksGroup/ReMoMask/resolve/main/assets/remomask_logo.png" alt="logo" width="30"/> ReMoMask: Retrieval-Augmented Masked Motion Generation |
|
|
|
|
|
This is the official repository for the paper [ReMoMask: Retrieval-Augmented Masked Motion Generation](https://huggingface.co/papers/2508.02605). |
|
|
|
|
|
- π [Paper](https://huggingface.co/papers/2508.02605) |
|
|
- π [Project Page](https://aigeeksgroup.github.io/ReMoMask/) |
|
|
- π» [Code](https://github.com/AIGeeksGroup/ReMoMask) |
|
|
|
|
|
https://github.com/user-attachments/assets/3f29c0c5-abb8-4fd1-893c-48ac82b79532 |
|
|
|
|
|
## Abstract |
|
|
|
|
|
Text-to-Motion (T2M) generation aims to synthesize realistic and semantically aligned human motion sequences from natural language descriptions. However, current approaches face dual challenges: Generative models (e.g., diffusion models) suffer from limited diversity, error accumulation, and physical implausibility, while Retrieval-Augmented Generation (RAG) methods exhibit diffusion inertia, partial-mode collapse, and asynchronous artifacts. To address these limitations, we propose ReMoMask, a unified framework integrating three key innovations: 1) A Bidirectional Momentum Text-Motion Model decouples negative sample scale from batch size via momentum queues, substantially improving cross-modal retrieval precision; 2) A Semantic Spatio-temporal Attention mechanism enforces biomechanical constraints during part-level fusion to eliminate asynchronous artifacts; 3) RAG-Classier-Free Guidance incorporates minor unconditional generation to enhance generalization. Built upon MoMask's RVQ-VAE, ReMoMask efficiently generates temporally coherent motions in minimal steps. Extensive experiments on standard benchmarks demonstrate the state-of-the-art performance of ReMoMask, achieving a 3.88% and 10.97% improvement in FID scores on HumanML3D and KIT-ML, respectively, compared to the previous SOTA method RAG-T2M. |
|
|
|
|
|
## Framework |
|
|
|
|
|
An overview of the ReMoMask framework: |
|
|
|
|
|
 |
|
|
|
|
|
## Sample Usage |
|
|
|
|
|
To run a local demo for motion generation, you can use the provided `demo.py` script from the GitHub repository. |
|
|
|
|
|
First, ensure you have the environment set up as described in the [GitHub repository's Prerequisite section](https://github.com/AIGeeksGroup/ReMoMask#prerequisite). |
|
|
|
|
|
Then, run the demo with a text prompt: |
|
|
```bash |
|
|
python demo.py --gpu_id 0 --ext exp1 --text_prompt "A person is walking on a circle." --checkpoints_dir logs --dataset_name humanml3d --mtrans_name pretrain_mtrans --rtrans_name pretrain_rtrans |
|
|
# change pretrain_mtrans and pretrain_rtrans to your mtrans and rtrans after your training done |
|
|
``` |
|
|
- `--repeat_times`: number of replications for generation, default `1`. |
|
|
- `--motion_length`: specify the number of poses for generation. |
|
|
|
|
|
The output will be saved in `./outputs/exp1/`. |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you find our work helpful or inspiring, please feel free to cite it. |
|
|
|
|
|
```bibtex |
|
|
@article{li2025remomask, |
|
|
title={ReMoMask: Retrieval-Augmented Masked Motion Generation}, |
|
|
author={Li, Zhengdao and Wang, Siheng and Zhang, Zeyu and Tang, Hao}, |
|
|
journal={arXiv preprint arXiv:2508.02605}, |
|
|
year={2025} |
|
|
} |
|
|
``` |