amaai-lab
/

text2midi

PyTorch

music

text-to-music

symbolic-music

Model card Files Files and versions

xet

Community

keshavbhandari commited on Dec 5, 2024

Commit

0e0e904

1 Parent(s): 3533ea8

Update README with listening study and objective evaluation results

Browse files

Files changed (1) hide show

README.md +33 -7

README.md CHANGED Viewed

@@ -75,13 +75,39 @@ pip install -r requirements.txt
 ## Datasets
 The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
-## Subjective Evaluation by Expert Listeners
-| **Model**     | **Overall Match** ↑ | **Chord Match** ↑ | **Tempo Match** ↑ | **Symbolic Quality** ↑ | **Musicality** ↑ | **Text Alignment** ↑ |
-|---------------|----------------------|-------------------|-------------------|------------------------|------------------|---------------------|
-| MuseCoco      | 4.12                | 3.02              | 3.85              | 3.50                   | 3.20             | 3.90                 |
-| text2midi     | 4.85                | 4.10              | 4.62              | 4.25                   | 4.45             | 4.78                 |
 ## Training
 To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running:

 ## Datasets
 The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
+## Results of the Listening Study
+Each question is rated on a Likert scale from 1 (very bad) to 7 (very good). The table shows the average ratings per question for each group of participants.
+| **Question**        | **General Audience (MidiCaps)** | **General Audience (text2midi)** | **Music Experts (MidiCaps)** | **Music Experts (text2midi)** |
+|---------------------|---------------------------------|-----------------------------------|------------------------------|--------------------------------|
+| Overall matching    | 5.17                           | 4.12                             | 5.29                        | 4.05                          |
+| Genre matching      | 5.22                           | 4.29                             | 5.31                        | 4.29                          |
+| Mood matching       | 5.24                           | 4.10                             | 5.44                        | 4.26                          |
+| Key matching        | 4.72                           | 4.24                             | 4.63                        | 4.05                          |
+| Chord matching      | 4.65                           | 4.23                             | 4.05                        | 4.06                          |
+| Tempo matching      | 4.72                           | 4.48                             | 5.15                        | 4.90                          |
+## Objective Evaluations
+| Metric              | text2midi | MidiCaps | MuseCoco |
+|---------------------|-----------|----------|----------|
+| CR ↑               | 2.156     | 3.4326   | 2.1288   |
+| CLAP ↑             | 0.2204    | 0.2593   | 0.2158   |
+| TB (%) ↑           | 34.03     | -        | 21.71    |
+| TBT (%) ↑          | 66.9      | -        | 54.63    |
+| CK (%) ↑           | 15.36     | -        | 13.70    |
+| CKD (%) ↑          | 15.80     | -        | 14.59    |
+**Note**:
+CR = Compression ratio
+CLAP = CLAP score
+TB = Tempo Bin
+TBT = Tempo Bin with Tolerance
+CK = Correct Key
+CKD = Correct Key with Duplicates
+↑ = Higher score is better.
 ## Training
 To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running: