Commit
Β·
0e0e904
1
Parent(s):
3533ea8
Update README with listening study and objective evaluation results
Browse files
README.md
CHANGED
|
@@ -75,13 +75,39 @@ pip install -r requirements.txt
|
|
| 75 |
## Datasets
|
| 76 |
The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
|
| 77 |
|
| 78 |
-
##
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
| 82 |
-
|
|
| 83 |
-
|
| 84 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
|
| 86 |
## Training
|
| 87 |
To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running:
|
|
|
|
| 75 |
## Datasets
|
| 76 |
The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
|
| 77 |
|
| 78 |
+
## Results of the Listening Study
|
| 79 |
+
|
| 80 |
+
Each question is rated on a Likert scale from 1 (very bad) to 7 (very good). The table shows the average ratings per question for each group of participants.
|
| 81 |
+
|
| 82 |
+
| **Question** | **General Audience (MidiCaps)** | **General Audience (text2midi)** | **Music Experts (MidiCaps)** | **Music Experts (text2midi)** |
|
| 83 |
+
|---------------------|---------------------------------|-----------------------------------|------------------------------|--------------------------------|
|
| 84 |
+
| Overall matching | 5.17 | 4.12 | 5.29 | 4.05 |
|
| 85 |
+
| Genre matching | 5.22 | 4.29 | 5.31 | 4.29 |
|
| 86 |
+
| Mood matching | 5.24 | 4.10 | 5.44 | 4.26 |
|
| 87 |
+
| Key matching | 4.72 | 4.24 | 4.63 | 4.05 |
|
| 88 |
+
| Chord matching | 4.65 | 4.23 | 4.05 | 4.06 |
|
| 89 |
+
| Tempo matching | 4.72 | 4.48 | 5.15 | 4.90 |
|
| 90 |
+
|
| 91 |
+
|
| 92 |
+
## Objective Evaluations
|
| 93 |
+
|
| 94 |
+
| Metric | text2midi | MidiCaps | MuseCoco |
|
| 95 |
+
|---------------------|-----------|----------|----------|
|
| 96 |
+
| CR β | 2.156 | 3.4326 | 2.1288 |
|
| 97 |
+
| CLAP β | 0.2204 | 0.2593 | 0.2158 |
|
| 98 |
+
| TB (%) β | 34.03 | - | 21.71 |
|
| 99 |
+
| TBT (%) β | 66.9 | - | 54.63 |
|
| 100 |
+
| CK (%) β | 15.36 | - | 13.70 |
|
| 101 |
+
| CKD (%) β | 15.80 | - | 14.59 |
|
| 102 |
+
|
| 103 |
+
**Note**:
|
| 104 |
+
CR = Compression ratio
|
| 105 |
+
CLAP = CLAP score
|
| 106 |
+
TB = Tempo Bin
|
| 107 |
+
TBT = Tempo Bin with Tolerance
|
| 108 |
+
CK = Correct Key
|
| 109 |
+
CKD = Correct Key with Duplicates
|
| 110 |
+
β = Higher score is better.
|
| 111 |
|
| 112 |
## Training
|
| 113 |
To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running:
|