Generative Disentanglement
This repository extends control-transfer-diffusion with a new architecture and training strategy for disentangled music representation learning.
Our work introduces targeted modifications to improve the separation of timbral, structural, and music-theoretic information, enabling finer control over music generation tasks.
This project is currently under review. We will update this repository with the final citation once available.
What's New
- Theory Encoder: Captures global musical attributes like key and tempo.
- Pitch Conditioning Module: Guides the structure encoder to better capture note-level content.
- Timbre Pretraining: Improved warm-up stage to focus timbre embeddings on timbral features only.
- Updated Adversarial Objective: Stronger disentanglement between structure, timbre, and theory.
Installation
Install the required dependencies:
pip install -r requirements.txt
Dataset Preparation
We use Slakh2100 for training and evaluation.
First, preprocess the dataset into LMDB format. You can speed up training by precomputing the encoded embeddings during LMDB creation. To do so, provide the path to the autoencoder checkpoint:
python dataset/split_to_lmdb.py --input_path /path/to/slakh --output_path /path/to/slakh_lmdb --slakh True --midi True --emb_model_path /path/to/autoencoder
Diffusion Model Training
Train the disentangled diffusion model:
python train_diffusion.py --name generative_disentanglement --db_path /path/to/slakh_lmdb --emb_model_path /path/to/autoencoder --config generative_disentanglement --dataset_type waveform --gpu 0
Model Checkpoints
Pretrained weights are available on this Hugging Face repo.
Notes
- This repository modifies the control-transfer-diffusion pipeline for improved disentanglement.
- If you use this work, please cite both the original paper and this extension once the citation is published.