|
|
--- |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- text-to-motion |
|
|
- motion-generation |
|
|
- diffusion-forcing |
|
|
- humanml3d |
|
|
- computer-animation |
|
|
library_name: transformers |
|
|
pipeline_tag: other |
|
|
--- |
|
|
|
|
|
# FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation |
|
|
|
|
|
<div align="center"> |
|
|
|
|
|
**A TINY version of the original FloodDiffusion** |
|
|
|
|
|
[Paper](https://arxiv.org/abs/2512.03520) | [Github](https://github.com/ShandaAI/FloodDiffusion) | [Project Page](https://shandaai.github.io/FloodDiffusion/) |
|
|
|
|
|
</div> |
|
|
|
|
|
## Installation |
|
|
|
|
|
### Prerequisites |
|
|
|
|
|
- Python 3.8+ |
|
|
- CUDA-capable GPU with 16GB+ VRAM (recommended) |
|
|
- 16GB+ system RAM |
|
|
|
|
|
### Dependencies |
|
|
|
|
|
**Step 1: Install basic dependencies** |
|
|
|
|
|
```bash |
|
|
pip install torch transformers huggingface_hub |
|
|
pip install lightning diffusers omegaconf ftfy numpy |
|
|
``` |
|
|
|
|
|
**Step 2: Install Flash Attention (Required)** |
|
|
|
|
|
Flash attention requires CUDA and may need compilation. Choose the appropriate method: |
|
|
|
|
|
```bash |
|
|
pip install flash-attn --no-build-isolation |
|
|
``` |
|
|
|
|
|
**Note:** Flash attention is **required** for this model. If installation fails, please refer to the [official flash-attention installation guide](https://github.com/Dao-AILab/flash-attention#installation-and-features). |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
### Basic Usage |
|
|
|
|
|
```python |
|
|
from transformers import AutoModel |
|
|
|
|
|
# Load model |
|
|
model = AutoModel.from_pretrained( |
|
|
"ShandaAI/FloodDiffusionTiny", |
|
|
trust_remote_code=True |
|
|
) |
|
|
|
|
|
# Generate motion from text (263-dim HumanML3D features) |
|
|
motion = model("a person walking forward", length=60) |
|
|
print(f"Generated motion: {motion.shape}") # (~240, 263) |
|
|
|
|
|
# Generate motion as joint coordinates (22 joints × 3 coords) with ema (alpha: 0.0-1.0) |
|
|
motion_joints = model("a person walking forward", length=60, output_joints=True, smoothing_alpha=0.5) |
|
|
print(f"Generated joints: {motion_joints.shape}") # (~240, 22, 3) |
|
|
``` |
|
|
|
|
|
### Batch Generation |
|
|
|
|
|
```python |
|
|
# Generate multiple motions efficiently |
|
|
texts = [ |
|
|
"a person walking forward", |
|
|
"a person running quickly", |
|
|
"a person jumping up and down" |
|
|
] |
|
|
lengths = [60, 50, 40] # Different lengths for each motion |
|
|
|
|
|
motions = model(texts, length=lengths) |
|
|
|
|
|
for i, motion in enumerate(motions): |
|
|
print(f"Motion {i}: {motion.shape}") |
|
|
``` |
|
|
|
|
|
### Multi-Text Motion Transitions |
|
|
|
|
|
```python |
|
|
# Generate a motion sequence with smooth transitions between actions |
|
|
motion = model( |
|
|
text=[["walk forward", "turn around", "run back"]], |
|
|
length=[120], |
|
|
text_end=[[40, 80, 120]] # Transition points in latent tokens |
|
|
) |
|
|
|
|
|
# Output: ~480 frames showing all three actions smoothly connected |
|
|
print(f"Transition motion: {motion[0].shape}") |
|
|
``` |
|
|
|
|
|
## API Reference |
|
|
|
|
|
### `model(text, length=60, text_end=None, num_denoise_steps=None, output_joints=False, smoothing_alpha=1.0)` |
|
|
|
|
|
Generate motion sequences from text descriptions. |
|
|
|
|
|
**Parameters:** |
|
|
|
|
|
- **text** (`str`, `List[str]`, or `List[List[str]]`): Text description(s) |
|
|
- Single string: Generate one motion |
|
|
- List of strings: Batch generation |
|
|
- Nested list: Multiple text prompts per motion (for transitions) |
|
|
|
|
|
- **length** (`int` or `List[int]`, default=60): Number of latent tokens to generate |
|
|
- Output frames ≈ `length × 4` (due to VAE upsampling) |
|
|
- Example: `length=60` → ~240 frames (~12 seconds at 20 FPS) |
|
|
|
|
|
- **text_end** (`List[int]` or `List[List[int]]`, optional): Latent token positions for text transitions |
|
|
- Only used when `text` is a nested list |
|
|
- Specifies when to switch between different text descriptions |
|
|
- **IMPORTANT**: Must have the same length as the corresponding text list |
|
|
- Example: `text=[["walk", "turn", "sit"]]` requires `text_end=[[20, 40, 60]]` (3 endpoints for 3 texts) |
|
|
- Must be in ascending order |
|
|
|
|
|
- **num_denoise_steps** (`int`, optional): Number of denoising iterations |
|
|
- Higher values produce better quality but slower generation |
|
|
- Recommended range: 10-50 |
|
|
|
|
|
- **output_joints** (`bool`, default=False): Output format selector |
|
|
- `False`: Returns 263-dimensional HumanML3D features |
|
|
- `True`: Returns 22×3 joint coordinates for direct visualization |
|
|
|
|
|
- **smoothing_alpha** (`float`, default=1.0): EMA smoothing factor for joint positions (only used when `output_joints=True`) |
|
|
- `1.0`: No smoothing (default) |
|
|
- `0.5`: Medium smoothing (recommended for smoother animations) |
|
|
- `0.0`: Maximum smoothing |
|
|
- Range: 0.0 to 1.0 |
|
|
|
|
|
**Returns:** |
|
|
- Single motion: |
|
|
- `output_joints=False`: `numpy.ndarray` of shape `(frames, 263)` |
|
|
- `output_joints=True`: `numpy.ndarray` of shape `(frames, 22, 3)` |
|
|
- Batch: `List[numpy.ndarray]` with shapes as above |
|
|
|
|
|
**Example:** |
|
|
```python |
|
|
# Single generation (263-dim features) |
|
|
motion = model("walk forward", length=60) # Returns (240, 263) |
|
|
|
|
|
# Single generation (joint coordinates) |
|
|
joints = model("walk forward", length=60, output_joints=True) # Returns (240, 22, 3) |
|
|
|
|
|
# Batch generation |
|
|
motions = model(["walk", "run"], length=[60, 50]) # Returns list of 2 arrays |
|
|
|
|
|
# Multi-text transitions |
|
|
motion = model( |
|
|
[["walk", "turn"]], |
|
|
length=[60], |
|
|
text_end=[[30, 60]] |
|
|
) # Returns list with 1 array of shape (240, 263) |
|
|
``` |
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use this model in your research, please cite: |
|
|
|
|
|
```bibtex |
|
|
@article{cai2025flooddiffusion, |
|
|
title={FloodDiffusion: Tailored Diffusion Forcing for Streaming Motion Generation}, |
|
|
author={Yiyi Cai, Yuhan Wu, Kunhang Li, You Zhou, Bo Zheng, Haiyang Liu}, |
|
|
journal={arXiv preprint arXiv:2512.03520}, |
|
|
year={2025} |
|
|
} |
|
|
``` |