TunaDance
Music-to-dance generation with a Gradio web UI and single-command CLI.
TunaDance builds on FineDance (ICCV 2023), a diffusion-based model that generates full-body 3D dance from music. This fork finetunes the original model on additional data and for more epochs beyond the original 2000, and adds a user-friendly interface layer and macOS support so you can go from an audio file to a rendered dance video without touching the inference internals.
[Original Paper] | [Original Project Page] | [Original Repo]
What's New (vs. upstream FineDance)
- Gradio Web UI (
app.py) β Upload music in the browser, get a dance video back. No CLI knowledge required. - Single-command CLI (
generate_dance.py) β One command handles the full pipeline: audio feature extraction, diffusion sampling, SMPLX rendering, and audio-video muxing. - macOS / MPS support β Updated
render.py,vis.py, and inference code to run on Apple Silicon via MPS, with a dedicatedenvironment_macos.yaml. - Accepts any audio format β Automatically converts input to WAV via ffmpeg (
.mp3,.wav,.m4a,.flac,.ogg, etc.). - Finetuned checkpoint β Finetuned the original FineDance model on additional data and for more epochs beyond the original 2000, improving dance quality and diversity.
- Cleaned-up repo β Removed wandb logs, debug scripts, and hardcoded paths.
Model Details
| Architecture | Transformer decoder with Gaussian diffusion |
| Input | 35-dim audio features (onset, 20 MFCC, 12 chroma, peak/beat onehot) per 4s window |
| Output | SMPLX body motion β 319-dim (4 contact + 3 translation + 52 joints x 6 rotation) |
| Checkpoint | assets/checkpoints/train-2000.pt (finetuned beyond 2000 epochs on additional data) |
| Body model | SMPLX (full body with hands) |
| Training data | FineDance dataset (7.7 hours of music-dance pairs) |
Quick Start
Prerequisites
# Install conda environment
conda env create -f environment.yaml # Linux/CUDA
conda env create -f environment_macos.yaml # macOS (Apple Silicon)
conda activate FineNet
Download the pretrained checkpoint and SMPLX model from Google Drive and place them under assets/.
Web UI (Recommended)
python app.py
Open http://127.0.0.1:7861 in your browser. Upload a music file and click Generate Dance.
Command Line
python generate_dance.py /path/to/music.mp3
Output is saved to output/<songname>_dance.mp4. Use --output for a custom path:
python generate_dance.py /path/to/music.mp3 --output my_dance.mp4
Output Specs
| Property | Value |
|---|---|
| Resolution | 1200 x 1200 |
| Frame rate | 30 fps |
| Duration | ~30 seconds |
| Body model | SMPLX (full body with hands) |
How It Works
- Audio conversion β Converts input to WAV if needed via ffmpeg
- Feature extraction β Slices audio into 4s windows (2s stride), extracts 35-dim features using librosa
- Dance generation β Diffusion model generates SMPLX motion sequence from audio features
- Rendering β Converts motion to SMPLX meshes, renders 900 frames at 30fps with pyrender
- Muxing β Merges rendered video with original audio via ffmpeg
Training
Only needed to train from scratch. The pretrained checkpoint is included.
python data/code/pre_motion.py # preprocess
accelerate launch train_seq.py --batch_size 32 --epochs 200 # train
Key flags:
--batch_sizeβ Default 400; reduce to 32 or lower for Mac MPS--epochsβ Default 2000--checkpointβ Resume from a saved checkpoint
FineDance Dataset
The dataset (7.7 hours) is available from Google Drive or Baidu Cloud. Place it under ./data.
import numpy as np
data = np.load("motion/001.npy")
smpl_poses = data[:, 3:] # joint rotations
smpl_trans = data[:, :3] # root translation
Two dataset splits are provided:
- FineDance@Genre (recommended) β Broader genre coverage in the test set
- FineDance@Dancer β Splits by dancer identity
Project Structure
TunaDance/
βββ app.py # Gradio web UI [NEW]
βββ generate_dance.py # End-to-end CLI [NEW]
βββ environment_macos.yaml # macOS conda env [NEW]
βββ train_seq.py # Training script
βββ test.py # Original inference script
βββ render.py # SMPLX mesh rendering (updated for MPS)
βββ vis.py # Skeleton/FK utilities (updated for MPS)
βββ args.py # CLI argument definitions
βββ assets/
β βββ checkpoints/
β β βββ train-2000.pt # Pretrained model (2000 epochs)
β βββ smpl_model/
β βββ smplx/
β βββ SMPLX_NEUTRAL.npz
βββ model/
β βββ model.py # SeqModel (transformer decoder)
β βββ diffusion.py # Gaussian diffusion
βββ dataset/
β βββ FineDance_dataset.py
βββ data/
βββ finedance/ # Training data (music + motion pairs)
Acknowledgments
This project is built on FineDance by Li et al. We thank the original authors for their work.
Upstream acknowledgments: EDGE, MDM, Adan, Diffusion, SMPLX.
Citation
@inproceedings{li2023finedance,
title={FineDance: A Fine-grained Choreography Dataset for 3D Full Body Dance Generation},
author={Li, Ronghui and Zhao, Junfan and Zhang, Yachao and Su, Mingyang and Ren, Zeping and Zhang, Han and Tang, Yansong and Li, Xiu},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={10234--10243},
year={2023}
}