Generative Disentanglement

This repository extends control-transfer-diffusion with a new architecture and training strategy for disentangled music representation learning.

Our work introduces targeted modifications to improve the separation of timbral, structural, and music-theoretic information, enabling finer control over music generation tasks.

This project is currently under review. We will update this repository with the final citation once available.

What's New

  • Theory Encoder: Captures global musical attributes like key and tempo.
  • Pitch Conditioning Module: Guides the structure encoder to better capture note-level content.
  • Timbre Pretraining: Improved warm-up stage to focus timbre embeddings on timbral features only.
  • Updated Adversarial Objective: Stronger disentanglement between structure, timbre, and theory.

Installation

Install the required dependencies:

pip install -r requirements.txt

Dataset Preparation

We use Slakh2100 for training and evaluation.

First, preprocess the dataset into LMDB format. You can speed up training by precomputing the encoded embeddings during LMDB creation. To do so, provide the path to the autoencoder checkpoint:

python dataset/split_to_lmdb.py --input_path /path/to/slakh --output_path /path/to/slakh_lmdb --slakh True --midi True --emb_model_path /path/to/autoencoder

Diffusion Model Training

Train the disentangled diffusion model:

python train_diffusion.py --name generative_disentanglement --db_path /path/to/slakh_lmdb --emb_model_path /path/to/autoencoder --config generative_disentanglement --dataset_type waveform --gpu 0

Model Checkpoints

Pretrained weights are available on this Hugging Face repo.

Notes

  • This repository modifies the control-transfer-diffusion pipeline for improved disentanglement.
  • If you use this work, please cite both the original paper and this extension once the citation is published.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support