---
title: MoMask
emoji: 🎭
colorFrom: pink
colorTo: purple
sdk: gradio
sdk_version: "6.1.0"
app_file: app_new.py
pinned: false
python_version: "3.10"
short_description: Text-to-3D motion generation using ONNX models
---

# MoMask: Text-to-Motion Generation

Generate 3D human skeleton animations from text descriptions using [MoMask](https://github.com/EricGuo5513/momask-codes).

## Features
- Text-to-motion generation with classifier-free guidance
- Download BVH files for Blender import
- ~7 seconds of motion per generation

## Model Architecture (ONNX FP32, ~416MB total)
| Model | Size | Purpose |
|-------|------|---------|
| CLIP Text Encoder | 254MB | Text embedding |
| Mask Transformer | 56MB | Initial motion tokens |
| Residual Transformer | 55MB | Refine motion details |
| VQ-VAE Decoder | 46MB | Decode to motion |
| Length Estimator | 0.5MB | Predict motion length |

## Usage
1. Enter a text description (e.g., "A person walks forward")
2. Optionally set duration and seed
3. Click Generate
4. Download MP4 video or BVH for Blender

## Credits
Based on [MoMask](https://github.com/EricGuo5513/momask-codes) by Chuan Guo et al.