PyTorch
music
text-to-music
symbolic-music
keshavbhandari commited on
Commit
cf836a3
Β·
1 Parent(s): 7bcc411

Add installation and training instructions

Browse files
Files changed (1) hide show
  1. README.md +12 -3
README.md CHANGED
@@ -65,23 +65,29 @@ post_processing("output.mid", "output.mid")
65
  ```
66
 
67
  ## Installation
 
68
  git clone https://github.com/AMAAI-Lab/text2midi
69
  cd text2midi
70
  pip install -r requirements.txt
 
71
 
72
  ## Datasets
73
  The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
74
 
75
  ## Subjective Evaluation by Expert Listeners
76
- Model Dataset Pre-trained Overall Match ↑ Chord Match ↑ Tempo Match ↑ Symbolic Quality ↑ Musicality ↑ Text Alignment ↑
77
- MuseCoco MidiCaps βœ“ 4.12 3.02 3.85 3.50 3.20 3.90
78
- text2midi MidiCaps βœ“ 4.85 4.10 4.62 4.25 4.45 4.78
 
 
 
79
 
80
  ## Training
81
  To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running:
82
  accelerate config
83
 
84
  Then, use the following command to start training:
 
85
  accelerate launch train.py \
86
  --encoder_model="google/flan-t5-large" \
87
  --decoder_model="configs/transformer_decoder_config.json" \
@@ -90,9 +96,11 @@ accelerate launch train.py \
90
  --batch_size=16 \
91
  --learning_rate=1e-4 \
92
  --epochs=40 \
 
93
 
94
  ## Citation
95
  If you use text2midi in your research, please cite:
 
96
  @misc{bhandari2025text2midi,
97
  title={text2midi: Generating Symbolic Music from Captions},
98
  author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
@@ -100,3 +108,4 @@ If you use text2midi in your research, please cite:
100
  eprint={2311.08355},
101
  archivePrefix={arXiv},
102
  }
 
 
65
  ```
66
 
67
  ## Installation
68
+ ```bash
69
  git clone https://github.com/AMAAI-Lab/text2midi
70
  cd text2midi
71
  pip install -r requirements.txt
72
+ ```
73
 
74
  ## Datasets
75
  The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
76
 
77
  ## Subjective Evaluation by Expert Listeners
78
+
79
+ | **Model** | **Overall Match** ↑ | **Chord Match** ↑ | **Tempo Match** ↑ | **Symbolic Quality** ↑ | **Musicality** ↑ | **Text Alignment** ↑ |
80
+ |---------------|-------------|-----------------|----------------------|-------------------|-------------------|------------------------|------------------|-----------------------|
81
+ | MuseCoco | 4.12 | 3.02 | 3.85 | 3.50 | 3.20 | 3.90 |
82
+ | text2midi | 4.85 | 4.10 | 4.62 | 4.25 | 4.45 | 4.78 |
83
+
84
 
85
  ## Training
86
  To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running:
87
  accelerate config
88
 
89
  Then, use the following command to start training:
90
+ ```bash
91
  accelerate launch train.py \
92
  --encoder_model="google/flan-t5-large" \
93
  --decoder_model="configs/transformer_decoder_config.json" \
 
96
  --batch_size=16 \
97
  --learning_rate=1e-4 \
98
  --epochs=40 \
99
+ ```
100
 
101
  ## Citation
102
  If you use text2midi in your research, please cite:
103
+ ```
104
  @misc{bhandari2025text2midi,
105
  title={text2midi: Generating Symbolic Music from Captions},
106
  author={Keshav Bhandari and Abhinaba Roy and Kyra Wang and Geeta Puri and Simon Colton and Dorien Herremans},
 
108
  eprint={2311.08355},
109
  archivePrefix={arXiv},
110
  }
111
+ ```