amaai-lab
/

text2midi

PyTorch

music

text-to-music

symbolic-music

Model card Files Files and versions

xet

Community

keshavbhandari commited on Dec 5, 2024

Commit

cf836a3

1 Parent(s): 7bcc411

Add installation and training instructions

Browse files

Files changed (1) hide show

README.md +12 -3

README.md CHANGED Viewed

@@ -65,23 +65,29 @@ post_processing("output.mid", "output.mid")
 ```
 ## Installation
 git clone https://github.com/AMAAI-Lab/text2midi
 cd text2midi
 pip install -r requirements.txt
 ## Datasets
 The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
 ## Subjective Evaluation by Expert Listeners
-Model	Dataset	Pre-trained	Overall Match ↑	Chord Match ↑	Tempo Match ↑	Symbolic Quality ↑	Musicality ↑	Text Alignment ↑
-MuseCoco	MidiCaps	✓	4.12	3.02	3.85	3.50	3.20	3.90
-text2midi	MidiCaps	✓	4.85	4.10	4.62	4.25	4.45	4.78
 ## Training
 To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running:
 accelerate config
 Then, use the following command to start training:
 accelerate launch train.py \
 --encoder_model="google/flan-t5-large" \
 --decoder_model="configs/transformer_decoder_config.json" \
@@ -90,9 +96,11 @@ accelerate launch train.py \
 --batch_size=16 \
 --learning_rate=1e-4 \
 --epochs=40 \
 ## Citation
 If you use text2midi in your research, please cite:
 @misc{bhandari2025text2midi,
       title={text2midi: Generating Symbolic Music from Captions},
       author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
@@ -100,3 +108,4 @@ If you use text2midi in your research, please cite:
       eprint={2311.08355},
       archivePrefix={arXiv},
 }

 ```
 ## Installation
+```bash
 git clone https://github.com/AMAAI-Lab/text2midi
 cd text2midi
 pip install -r requirements.txt
+```
 ## Datasets
 The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
 ## Subjective Evaluation by Expert Listeners
+| **Model**     | **Overall Match** ↑ | **Chord Match** ↑ | **Tempo Match** ↑ | **Symbolic Quality** ↑ | **Musicality** ↑ | **Text Alignment** ↑ |
+|---------------|-------------|-----------------|----------------------|-------------------|-------------------|------------------------|------------------|-----------------------|
+| MuseCoco      | 4.12                | 3.02              | 3.85              | 3.50                   | 3.20             | 3.90                  |
+| text2midi     | 4.85                | 4.10              | 4.62              | 4.25                   | 4.45             | 4.78                  |
 ## Training
 To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running:
 accelerate config
 Then, use the following command to start training:
+```bash
 accelerate launch train.py \
 --encoder_model="google/flan-t5-large" \
 --decoder_model="configs/transformer_decoder_config.json" \
 --batch_size=16 \
 --learning_rate=1e-4 \
 --epochs=40 \
+```
 ## Citation
 If you use text2midi in your research, please cite:
+```
 @misc{bhandari2025text2midi,
       title={text2midi: Generating Symbolic Music from Captions},
       author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
       eprint={2311.08355},
       archivePrefix={arXiv},
 }
+```