S2C / README.md

Update README.md

3ac11d7 verified 5 months ago

6.19 kB

	---
	license: mit
	---
	<div align="center">

	## 🎹 [ACMMM '25] Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis

	[Zihao Liu](https://github.com/monkek123King)<sup>\</sup>, [Mingwen Ou](https://github.com/OMTHSJUHW)<sup>\</sup>, [Zunnan Xu](https://kkakkkka.github.io/)<sup>\*</sup>, [Jiaqi Huang](https://github.com/jiaqihuang01), [Haonan Han](https://vincenthancoder.github.io/), [Ronghui Li](https://li-ronghui.github.io/), [Xiu Li](https://scholar.google.com/citations?hl=zh-CN&user=Xrh1OIUAAAAJ&view_op=list_works&sortby=pubdate)<sup>†</sup>

	Tsinghua University

	<sup>\*</sup> Equal contribution.
	<sup>†</sup> Corresponding author.

	🏠 [Homepage](https://monkek123King.github.io/S2C_page) 📄 [Paper](https://arxiv.org/abs/2504.09885) 💾 Dataset [[Google Drive](https://drive.google.com/drive/folders/1JY0zOE0s7v9ZYLlIP1kCZUdNrih5nYEt?usp=sharing)]/[[Hyper.ai](https://hyper.ai/datasets/32494)]/[[Zenodo](https://zenodo.org/records/13297386)] 🤗 Model [[HuggingFace](https://huggingface.co/thuteam/S2C/tree/main)]

	</div>

	-----

	### 📢 News

	* `Sept 2025`: Experiment checkpoints are released [here](https://huggingface.co/thuteam/S2C)\! 🎉
	* `July 2025`: Our paper has been accepted to ACMMM 2025\! 🥳
	* `April 2025`: The paper is now available on [arXiv](https://arxiv.org/abs/2504.09885). ☕️

	-----

	## 🚀 Getting Started

	### 🔧 Installation

	a. Create a conda virtual environment and activate it.
	```shell
	conda create -n S2C python=3.10 -y
	conda activate S2C
	```

	b. Install PyTorch and torchvision following the [official instructions](https://pytorch.org/).
	```shell
	pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
	```

	c. Clone S2C.
	```
	git clone https://github.com/monkek123King/S2C.git
	```

	d. Install other requirements.
	```shell
	cd S2C
	pip install -r requirement.txt
	```

	e. Prepare MANO models.

	Besides, you also need to download the MANO model. Please visit the [MANO website](https://mano.is.tue.mpg.de/) and register to get access to the downloads section. We only require the right hand model. You need to put MANO_RIGHT.pkl under the ./mano folder.

	f. Prepare pretrained models. (Used in training.)

	Download pretrained HuBert([Large](https://huggingface.co/facebook/hubert-large-ls960-ft)) to `S2C/checkpoints`.

	g. Prepare Gesture Autoencoder model. (Used in evaluation.)

	Download pretrained [Gesture Autoencoder model](https://drive.google.com/file/d/1G2Fe_zlJn8I_U_VGldH4SsIa_KauvG3p/view?usp=sharing) to `S2C/checkpoints`

	```
	checkpoints
	├── gesture_autoencoder_checkpoint_best.bin
	├── hubert-large-ls960-ft/
	```

	### 📦 Prepare Dataset

	PianoMotion10M

	Download PianoMotion10M V1.0 full dataset data [HERE](https://drive.google.com/drive/folders/1JY0zOE0s7v9ZYLlIP1kCZUdNrih5nYEt?usp=sharing).

	```
	cd /path/to/PianoMotion10M_Dataset
	unzip annotation.zip
	unzip audio.zip
	unzip midi.zip
	```


	Folder structure
	```
	/path/to/PianoMotion10M_Dataset
	├── annotation/
	│ ├── 1033685137/
	│ │ ├── BV1f34y1i7U1/
	│ │ │ ├──BV1f34y1i7U1_seq_0000.json
	│ │ │ ├──BV1f34y1i7U1_seq_0001.json
	│ │ │ ├──...
	│ │ ├── BV1X44y1J7CR/
	│ ├── 2084102325/
	│ ├── ...
	├── audio/
	│ ├── 1033685137/
	│ │ ├── BV1f34y1i7U1/
	│ │ │ ├──BV1f34y1i7U1_seq_0000.mp3
	│ │ │ ├──BV1f34y1i7U1_seq_0001.mp3
	│ │ │ ├──...
	│ │ ├── BV1X44y1J7CR/
	│ ├── 2084102325/
	│ ├── ...
	├── midi/
	│ ├── 1033685137/
	│ │ ├── BV1f34y1i7U1.mid
	│ │ ├── BV1X44y1J7CR.mid
	│ │ ├── ...
	│ ├── 2084102325/
	│ ├── ...
	├── train.txt
	├── test.txt
	├── valid.txt
	```

	Usage

	`draw.py` shows the usage of our dataset and visualizes some samples of hand motions under `./draw_sample`.

	```shell
	python draw.py
	```

	### 🏋️ Train and Evaluate

	Please ensure you have prepared the environment and the PianoMotion10M dataset.

	Train and Test

	Train S2C Position Predictor with Hubert and transformer. Feel free to change audio feature extractor by `--wav2vec_path`. The result will be stored in `./logs/`.
	```
	python train.py --experiment_name piano2posi_LR --bs_dim 6 --adjust --is_random --up_list 1467634 66685747 \
	--data_root ./ --iterations 200000 --batch_size 8 --train_sec 8 --feature_dim 512 \
	--wav2vec_path ./checkpoints/hubert-large-ls960-ft --check_val_every_n_iteration 1000 --save_every_n_iteration 1000 \
	--latest_layer tanh --encoder_type transformer --num_layer 4
	```

	Train S2C Gesture Generator with Hubert and transformer. The result will be stored in `./logs/`.
	```
	python train_diffusion.py --experiment_name piano2mot --is_random --unet_dim 256 --iterations 800000 \
	--bs_dim 96 --batch_size 16 --train_sec 8 --data_root ./ \
	--xyz_guide --check_val_every_n_iteration 1000 --save_every_n_iteration 1000 \
	--adjust --piano2posi_path logs/piano2posi_LR --encoder_type transformer --num_layer 4 \
	--lr 1e-5 --fusion 4 --obj pred_v
	```

	Eval S2C after training S2C Gesture Generator on the validation set.
	```
	python eval.py --exp_path /path/to/logs (e.g. ./logs/piano2mot) --data_root /path/to/PianoMotion10M_Dataset --valid_batch_size 64 --mode valid
	```

	Visualization

	Visualize the results, which will be stored in `./results`.

	```
	python infer.py --exp_path /path/to/logs --data_root /path/to/PianoMotion10M_Dataset --valid_batch_size 64 --mode valid
	```

	-----

	## ✍️ Citation

	If you find our work useful for your research, please consider citing our paper and giving this repository a star 🌟.

	```bibtex
	@article{liu2025s2c,
	title={Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis},
	author={Liu, Zihao and Ou, Mingwen and Xu, Zunnan and Huang, Jiaqi and Han, Haonan and Li, Ronghui and Li, Xiu},
	journal={arXiv preprint arXiv:2504.09885},
	year={2025}
	}
	```