S2C / README.md
kkakkkka's picture
Update README.md
3ac11d7 verified
---
license: mit
---
<div align="center">
## 🎹 [ACMMM '25] Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis
[Zihao Liu](https://github.com/monkek123King)<sup>\*</sup>, [Mingwen Ou](https://github.com/OMTHSJUHW)<sup>\*</sup>, [Zunnan Xu](https://kkakkkka.github.io/)<sup>\*</sup>, [Jiaqi Huang](https://github.com/jiaqihuang01), [Haonan Han](https://vincenthancoder.github.io/), [Ronghui Li](https://li-ronghui.github.io/), [Xiu Li](https://scholar.google.com/citations?hl=zh-CN&user=Xrh1OIUAAAAJ&view_op=list_works&sortby=pubdate)<sup>†</sup>
Tsinghua University
<sup>\*</sup> Equal contribution.
<sup>†</sup> Corresponding author.
🏠 [Homepage](https://monkek123King.github.io/S2C_page) Β Β Β Β  πŸ“„ [Paper](https://arxiv.org/abs/2504.09885) Β Β Β Β  πŸ’Ύ Dataset [[Google Drive](https://drive.google.com/drive/folders/1JY0zOE0s7v9ZYLlIP1kCZUdNrih5nYEt?usp=sharing)]/[[Hyper.ai](https://hyper.ai/datasets/32494)]/[[Zenodo](https://zenodo.org/records/13297386)] Β Β Β Β  πŸ€— Model [[HuggingFace](https://huggingface.co/thuteam/S2C/tree/main)]
</div>
-----
### πŸ“’ News
* **`Sept 2025`:** Experiment checkpoints are released [here](https://huggingface.co/thuteam/S2C)\! πŸŽ‰
* **`July 2025`:** Our paper has been accepted to ACMMM 2025\! πŸ₯³
* **`April 2025`:** The paper is now available on [arXiv](https://arxiv.org/abs/2504.09885). β˜•οΈ
-----
## πŸš€ Getting Started
### πŸ”§ Installation
**a. Create a conda virtual environment and activate it.**
```shell
conda create -n S2C python=3.10 -y
conda activate S2C
```
**b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/).**
```shell
pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
```
**c. Clone S2C.**
```
git clone https://github.com/monkek123King/S2C.git
```
**d. Install other requirements.**
```shell
cd S2C
pip install -r requirement.txt
```
**e. Prepare MANO models.**
Besides, you also need to download the MANO model. Please visit the [MANO website](https://mano.is.tue.mpg.de/) and register to get access to the downloads section. We only require the right hand model. You need to put MANO_RIGHT.pkl under the ./mano folder.
**f. Prepare pretrained models. (Used in training.)**
Download pretrained HuBert([Large](https://huggingface.co/facebook/hubert-large-ls960-ft)) to `S2C/checkpoints`.
**g. Prepare Gesture Autoencoder model. (Used in evaluation.)**
Download pretrained [Gesture Autoencoder model](https://drive.google.com/file/d/1G2Fe_zlJn8I_U_VGldH4SsIa_KauvG3p/view?usp=sharing) to `S2C/checkpoints`
```
checkpoints
β”œβ”€β”€ gesture_autoencoder_checkpoint_best.bin
β”œβ”€β”€ hubert-large-ls960-ft/
```
### πŸ“¦ Prepare Dataset
**PianoMotion10M**
Download PianoMotion10M V1.0 full dataset data [HERE](https://drive.google.com/drive/folders/1JY0zOE0s7v9ZYLlIP1kCZUdNrih5nYEt?usp=sharing).
```
cd /path/to/PianoMotion10M_Dataset
unzip annotation.zip
unzip audio.zip
unzip midi.zip
```
**Folder structure**
```
/path/to/PianoMotion10M_Dataset
β”œβ”€β”€ annotation/
β”‚ β”œβ”€β”€ 1033685137/
β”‚ β”‚ β”œβ”€β”€ BV1f34y1i7U1/
β”‚ β”‚ β”‚ β”œβ”€β”€BV1f34y1i7U1_seq_0000.json
β”‚ β”‚ β”‚ β”œβ”€β”€BV1f34y1i7U1_seq_0001.json
β”‚ β”‚ β”‚ β”œβ”€β”€...
β”‚ β”‚ β”œβ”€β”€ BV1X44y1J7CR/
β”‚ β”œβ”€β”€ 2084102325/
β”‚ β”œβ”€β”€ ...
β”œβ”€β”€ audio/
β”‚ β”œβ”€β”€ 1033685137/
β”‚ β”‚ β”œβ”€β”€ BV1f34y1i7U1/
β”‚ β”‚ β”‚ β”œβ”€β”€BV1f34y1i7U1_seq_0000.mp3
β”‚ β”‚ β”‚ β”œβ”€β”€BV1f34y1i7U1_seq_0001.mp3
β”‚ β”‚ β”‚ β”œβ”€β”€...
β”‚ β”‚ β”œβ”€β”€ BV1X44y1J7CR/
β”‚ β”œβ”€β”€ 2084102325/
β”‚ β”œβ”€β”€ ...
β”œβ”€β”€ midi/
β”‚ β”œβ”€β”€ 1033685137/
β”‚ β”‚ β”œβ”€β”€ BV1f34y1i7U1.mid
β”‚ β”‚ β”œβ”€β”€ BV1X44y1J7CR.mid
β”‚ β”‚ β”œβ”€β”€ ...
β”‚ β”œβ”€β”€ 2084102325/
β”‚ β”œβ”€β”€ ...
β”œβ”€β”€ train.txt
β”œβ”€β”€ test.txt
β”œβ”€β”€ valid.txt
```
**Usage**
`draw.py` shows the usage of our dataset and visualizes some samples of hand motions under `./draw_sample`.
```shell
python draw.py
```
### πŸ‹οΈ Train and Evaluate
**Please ensure you have prepared the environment and the PianoMotion10M dataset.**
**Train and Test**
Train S2C Position Predictor with Hubert and transformer. Feel free to change audio feature extractor by `--wav2vec_path`. The result will be stored in `./logs/`.
```
python train.py --experiment_name piano2posi_LR --bs_dim 6 --adjust --is_random --up_list 1467634 66685747 \
--data_root ./ --iterations 200000 --batch_size 8 --train_sec 8 --feature_dim 512 \
--wav2vec_path ./checkpoints/hubert-large-ls960-ft --check_val_every_n_iteration 1000 --save_every_n_iteration 1000 \
--latest_layer tanh --encoder_type transformer --num_layer 4
```
Train S2C Gesture Generator with Hubert and transformer. The result will be stored in `./logs/`.
```
python train_diffusion.py --experiment_name piano2mot --is_random --unet_dim 256 --iterations 800000 \
--bs_dim 96 --batch_size 16 --train_sec 8 --data_root ./ \
--xyz_guide --check_val_every_n_iteration 1000 --save_every_n_iteration 1000 \
--adjust --piano2posi_path logs/piano2posi_LR --encoder_type transformer --num_layer 4 \
--lr 1e-5 --fusion 4 --obj pred_v
```
Eval S2C after training S2C Gesture Generator on the validation set.
```
python eval.py --exp_path /path/to/logs (e.g. ./logs/piano2mot) --data_root /path/to/PianoMotion10M_Dataset --valid_batch_size 64 --mode valid
```
**Visualization**
Visualize the results, which will be stored in `./results`.
```
python infer.py --exp_path /path/to/logs --data_root /path/to/PianoMotion10M_Dataset --valid_batch_size 64 --mode valid
```
-----
## ✍️ Citation
If you find our work useful for your research, please consider citing our paper and giving this repository a star 🌟.
```bibtex
@article{liu2025s2c,
title={Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis},
author={Liu, Zihao and Ou, Mingwen and Xu, Zunnan and Huang, Jiaqi and Han, Haonan and Li, Ronghui and Li, Xiu},
journal={arXiv preprint arXiv:2504.09885},
year={2025}
}
```