Commit
Β·
cf836a3
1
Parent(s):
7bcc411
Add installation and training instructions
Browse files
README.md
CHANGED
|
@@ -65,23 +65,29 @@ post_processing("output.mid", "output.mid")
|
|
| 65 |
```
|
| 66 |
|
| 67 |
## Installation
|
|
|
|
| 68 |
git clone https://github.com/AMAAI-Lab/text2midi
|
| 69 |
cd text2midi
|
| 70 |
pip install -r requirements.txt
|
|
|
|
| 71 |
|
| 72 |
## Datasets
|
| 73 |
The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
|
| 74 |
|
| 75 |
## Subjective Evaluation by Expert Listeners
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
|
|
|
|
|
|
|
|
|
|
| 79 |
|
| 80 |
## Training
|
| 81 |
To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running:
|
| 82 |
accelerate config
|
| 83 |
|
| 84 |
Then, use the following command to start training:
|
|
|
|
| 85 |
accelerate launch train.py \
|
| 86 |
--encoder_model="google/flan-t5-large" \
|
| 87 |
--decoder_model="configs/transformer_decoder_config.json" \
|
|
@@ -90,9 +96,11 @@ accelerate launch train.py \
|
|
| 90 |
--batch_size=16 \
|
| 91 |
--learning_rate=1e-4 \
|
| 92 |
--epochs=40 \
|
|
|
|
| 93 |
|
| 94 |
## Citation
|
| 95 |
If you use text2midi in your research, please cite:
|
|
|
|
| 96 |
@misc{bhandari2025text2midi,
|
| 97 |
title={text2midi: Generating Symbolic Music from Captions},
|
| 98 |
author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
|
|
@@ -100,3 +108,4 @@ If you use text2midi in your research, please cite:
|
|
| 100 |
eprint={2311.08355},
|
| 101 |
archivePrefix={arXiv},
|
| 102 |
}
|
|
|
|
|
|
| 65 |
```
|
| 66 |
|
| 67 |
## Installation
|
| 68 |
+
```bash
|
| 69 |
git clone https://github.com/AMAAI-Lab/text2midi
|
| 70 |
cd text2midi
|
| 71 |
pip install -r requirements.txt
|
| 72 |
+
```
|
| 73 |
|
| 74 |
## Datasets
|
| 75 |
The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
|
| 76 |
|
| 77 |
## Subjective Evaluation by Expert Listeners
|
| 78 |
+
|
| 79 |
+
| **Model** | **Overall Match** β | **Chord Match** β | **Tempo Match** β | **Symbolic Quality** β | **Musicality** β | **Text Alignment** β |
|
| 80 |
+
|---------------|-------------|-----------------|----------------------|-------------------|-------------------|------------------------|------------------|-----------------------|
|
| 81 |
+
| MuseCoco | 4.12 | 3.02 | 3.85 | 3.50 | 3.20 | 3.90 |
|
| 82 |
+
| text2midi | 4.85 | 4.10 | 4.62 | 4.25 | 4.45 | 4.78 |
|
| 83 |
+
|
| 84 |
|
| 85 |
## Training
|
| 86 |
To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running:
|
| 87 |
accelerate config
|
| 88 |
|
| 89 |
Then, use the following command to start training:
|
| 90 |
+
```bash
|
| 91 |
accelerate launch train.py \
|
| 92 |
--encoder_model="google/flan-t5-large" \
|
| 93 |
--decoder_model="configs/transformer_decoder_config.json" \
|
|
|
|
| 96 |
--batch_size=16 \
|
| 97 |
--learning_rate=1e-4 \
|
| 98 |
--epochs=40 \
|
| 99 |
+
```
|
| 100 |
|
| 101 |
## Citation
|
| 102 |
If you use text2midi in your research, please cite:
|
| 103 |
+
```
|
| 104 |
@misc{bhandari2025text2midi,
|
| 105 |
title={text2midi: Generating Symbolic Music from Captions},
|
| 106 |
author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
|
|
|
|
| 108 |
eprint={2311.08355},
|
| 109 |
archivePrefix={arXiv},
|
| 110 |
}
|
| 111 |
+
```
|