PyTorch
music
text-to-music
symbolic-music
keshavbhandari commited on
Commit
0e0e904
Β·
1 Parent(s): 3533ea8

Update README with listening study and objective evaluation results

Browse files
Files changed (1) hide show
  1. README.md +33 -7
README.md CHANGED
@@ -75,13 +75,39 @@ pip install -r requirements.txt
75
  ## Datasets
76
  The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
77
 
78
- ## Subjective Evaluation by Expert Listeners
79
-
80
- | **Model** | **Overall Match** ↑ | **Chord Match** ↑ | **Tempo Match** ↑ | **Symbolic Quality** ↑ | **Musicality** ↑ | **Text Alignment** ↑ |
81
- |---------------|----------------------|-------------------|-------------------|------------------------|------------------|---------------------|
82
- | MuseCoco | 4.12 | 3.02 | 3.85 | 3.50 | 3.20 | 3.90 |
83
- | text2midi | 4.85 | 4.10 | 4.62 | 4.25 | 4.45 | 4.78 |
84
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85
 
86
  ## Training
87
  To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running:
 
75
  ## Datasets
76
  The MidiCaps dataset is a large-scale dataset of 168k MIDI files paired with rich text captions. These captions contain musical attributes such as key, tempo, style, and mood, making it ideal for text-to-MIDI generation tasks.
77
 
78
+ ## Results of the Listening Study
79
+
80
+ Each question is rated on a Likert scale from 1 (very bad) to 7 (very good). The table shows the average ratings per question for each group of participants.
81
+
82
+ | **Question** | **General Audience (MidiCaps)** | **General Audience (text2midi)** | **Music Experts (MidiCaps)** | **Music Experts (text2midi)** |
83
+ |---------------------|---------------------------------|-----------------------------------|------------------------------|--------------------------------|
84
+ | Overall matching | 5.17 | 4.12 | 5.29 | 4.05 |
85
+ | Genre matching | 5.22 | 4.29 | 5.31 | 4.29 |
86
+ | Mood matching | 5.24 | 4.10 | 5.44 | 4.26 |
87
+ | Key matching | 4.72 | 4.24 | 4.63 | 4.05 |
88
+ | Chord matching | 4.65 | 4.23 | 4.05 | 4.06 |
89
+ | Tempo matching | 4.72 | 4.48 | 5.15 | 4.90 |
90
+
91
+
92
+ ## Objective Evaluations
93
+
94
+ | Metric | text2midi | MidiCaps | MuseCoco |
95
+ |---------------------|-----------|----------|----------|
96
+ | CR ↑ | 2.156 | 3.4326 | 2.1288 |
97
+ | CLAP ↑ | 0.2204 | 0.2593 | 0.2158 |
98
+ | TB (%) ↑ | 34.03 | - | 21.71 |
99
+ | TBT (%) ↑ | 66.9 | - | 54.63 |
100
+ | CK (%) ↑ | 15.36 | - | 13.70 |
101
+ | CKD (%) ↑ | 15.80 | - | 14.59 |
102
+
103
+ **Note**:
104
+ CR = Compression ratio
105
+ CLAP = CLAP score
106
+ TB = Tempo Bin
107
+ TBT = Tempo Bin with Tolerance
108
+ CK = Correct Key
109
+ CKD = Correct Key with Duplicates
110
+ ↑ = Higher score is better.
111
 
112
  ## Training
113
  To train text2midi, we recommend using accelerate for multi-GPU support. First, configure accelerate by running: