TiMauzi commited on
Commit
658905f
·
verified ·
1 Parent(s): 3882ff1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -11
README.md CHANGED
@@ -2,38 +2,126 @@
2
  library_name: transformers
3
  tags:
4
  - generated_from_trainer
 
 
 
 
 
5
  datasets:
6
- - generator
7
  metrics:
8
  - accuracy
9
  - f1
10
  model-index:
11
- - name: EraClassifierBiLSTM
12
  results: []
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
 
18
- # EraClassifierBiLSTM
19
-
20
- This model is a fine-tuned version of [](https://huggingface.co/) on the generator dataset.
21
- It achieves the following results on the evaluation set:
22
  - Loss: 1.0162
23
  - Accuracy: 0.6572
24
  - F1: 0.5121
25
 
26
  ## Model description
27
 
28
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Intended uses & limitations
31
 
32
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ## Training and evaluation data
35
 
36
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ## Training procedure
39
 
@@ -83,6 +171,8 @@ The following hyperparameters were used during training:
83
  | 0.5607 | 2.8879 | 56000 | 1.0263 | 0.6539 | 0.5060 |
84
  | 0.5582 | 2.9911 | 58000 | 1.0269 | 0.6593 | 0.5103 |
85
 
 
 
86
 
87
  ### Framework versions
88
 
 
2
  library_name: transformers
3
  tags:
4
  - generated_from_trainer
5
+ - midi
6
+ - music
7
+ - era-classification
8
+ - bilstm
9
+ - audio-analysis
10
  datasets:
11
+ - TiMauzi/imslp-midi-by-sa
12
  metrics:
13
  - accuracy
14
  - f1
15
  model-index:
16
+ - name: EraClassifierBiLSTM-134M
17
  results: []
18
  ---
19
 
20
+ # EraClassifierBiLSTM-134M
 
21
 
22
+ This model is a bidirectional LSTM neural network designed for musical era classification from MIDI data. It achieves the following results on the evaluation set:
 
 
 
23
  - Loss: 1.0162
24
  - Accuracy: 0.6572
25
  - F1: 0.5121
26
 
27
  ## Model description
28
 
29
+ The EraClassifierBiLSTM-134M is a bidirectional LSTM neural network specifically designed for classifying musical compositions into historical eras based on MIDI data analysis. This large model variant (~134M parameters) offers superior performance compared to the compact 4.76M version, making it suitable for applications requiring higher accuracy.
30
+
31
+ ### Architecture
32
+ - **Model Type**: Custom Bidirectional LSTM (BiLSTM)
33
+ - **Input**: Sequences of 8-dimensional feature vectors extracted from MIDI messages
34
+ - **Window Size**: 24 MIDI messages per sequence with stride=20 (overlapping windows)
35
+ - **Hidden Layers**: 2 bidirectional LSTM layers with 2048 hidden units each
36
+ - **Output**: 6-class classification (musical eras)
37
+ - **Activation**: LeakyReLU with dropout for regularization
38
+ - **Loss Function**: CrossEntropyLoss
39
+
40
+ ### Feature Engineering
41
+ The model processes 8 key MIDI features per message, automatically selected as the most frequent features across the dataset:
42
+
43
+ **Numerical Features (7):**
44
+ - **channel**: MIDI channel number (μ=2.01, σ=2.74)
45
+ - **control**: Control change values (μ=11.90, σ=17.02)
46
+ - **note**: Note pitch/midi note number (μ=64.17, σ=12.00)
47
+ - **tempo**: Tempo in microseconds per beat (μ=738221.63, σ=460369.34)
48
+ - **time**: Timing information in ticks (μ=714.28, σ=1337451.38)
49
+ - **value**: Generic value field (μ=83.91, σ=26.72)
50
+ - **velocity**: Note velocity/intensity (μ=42.80, σ=44.24)
51
+
52
+ **Categorical Features (1):**
53
+ - **type**: MIDI message type (mapped to numerical IDs)
54
+
55
+ All numerical features are normalized using dataset statistics (mean and standard deviation), while categorical features are encoded using learned ID mappings.
56
+
57
+ ### Training Approach
58
+ The model uses a sliding window approach to capture temporal patterns in musical structure that are characteristic of different historical periods. Each MIDI file is processed into multiple overlapping sequences, allowing the model to learn both local and global musical patterns.
59
+
60
+ ### Performance Comparison
61
+ Compared to the 4.76M model, this larger variant shows significant improvements:
62
+ - **Accuracy**: +7.2% improvement (65.7% vs 58.5%)
63
+ - **F1 Score**: +19.1% improvement (0.512 vs 0.430)
64
+ - **Loss**: 7.1% reduction (1.016 vs 1.094)
65
 
66
  ## Intended uses & limitations
67
 
68
+ ### Intended Uses
69
+ - **Musicological Research**: Analyzing historical trends in musical composition
70
+ - **Educational Tools**: Teaching music history through automated era identification
71
+ - **Digital Music Libraries**: Automatic categorization and organization of MIDI collections
72
+ - **Music Analysis**: Understanding stylistic characteristics across different periods
73
+ - **Content Recommendation**: Suggesting music from similar historical periods
74
+
75
+ ### Limitations
76
+ - **Performance Variability**: While improved, the model still shows performance differences across eras:
77
+ - Excellent performance on Romantic (85.4%) and Baroque (71.4%) eras
78
+ - Good performance on Renaissance (57.0%) and Modern (51.9%) eras
79
+ - Moderate performance on Classical (21.4%) and Other (21.1%) categories
80
+ - **Era Confusion**: Adjacent historical periods are still confused, though less frequently:
81
+ - Renaissance music occasionally misclassified as Baroque (30.0%)
82
+ - Classical music still confused with Baroque (32.3%) and Romantic (31.0%)
83
+ - Modern music sometimes misclassified as Romantic (21.4%)
84
+ - **Computational Requirements**: Higher memory and processing requirements compared to smaller models
85
+ - **Data Dependencies**: Performance depends on the quality and representativeness of the training data
86
+ - **MIDI-Only**: Limited to MIDI format; cannot process audio recordings or sheet music
87
+ - **Cultural Bias**: Training data may reflect Western classical music traditions
88
+
89
+ ### Recommendations for Use
90
+ - Validate results with musicological expertise, especially for Classical period identification
91
+ - Use confidence thresholds to filter low-confidence predictions
92
 
93
  ## Training and evaluation data
94
 
95
+ ### Dataset
96
+ - **Source**: TiMauzi/imslp-midi-by-sa (International Music Score Library Project)
97
+ - **Format**: MIDI files with associated metadata including composition year and era
98
+ - **Preprocessing**: MIDI messages converted to 8-dimensional feature vectors
99
+ - **Window Strategy**: 24-message windows with 20-message stride for overlapping sequences
100
+
101
+ ### Musical Eras Covered
102
+ 1. **Renaissance** (1400-1600): Early polyphonic music, madrigals, motets
103
+ 2. **Baroque** (1600-1750): Ornamented music, basso continuo, fugues
104
+ 3. **Classical** (1750-1820): Clear forms, balanced phrases, sonata form
105
+ 4. **Romantic** (1820-1900): Expressive, emotional, expanded forms
106
+ 5. **Modern** (1900-present): Atonal, experimental, diverse styles
107
+ 6. **Other**: Miscellaneous or unclear period classifications
108
+
109
+ ### Data Distribution
110
+ The model was trained on 6,992 MIDI files from the IMSLP dataset with the following era distribution:
111
+ - **Romantic**: 2,722 samples (38.9%) - median year 1854
112
+ - **Baroque**: 1,874 samples (26.8%) - median year 1710
113
+ - **Renaissance**: 843 samples (12.1%) - median year 1611
114
+ - **Modern**: 763 samples (10.9%) - median year 2020
115
+ - **Classical**: 597 samples (8.5%) - median year 1779
116
+ - **Other**: 193 samples (2.8%) - median year 1909 (includes Early 20th century and Medieval)
117
+
118
+ Era thresholding was applied (minimum 150 samples per era), with rare eras like "Early 20th century" (125 samples) and "Medieval" (5 samples) mapped to the "Other" category to maintain classification stability.
119
+
120
+ ### Evaluation Strategy
121
+ - **Validation**: Performance measured on held-out validation set
122
+ - **Test Set**: Final evaluation on completely unseen test data
123
+ - **Metrics**: Accuracy, F1-score (macro-averaged), and confusion matrix analysis
124
+ - **Training Duration**: 3 epochs (~58,000 training steps) with fallback to best result (early stopping) based on F1 score
125
 
126
  ## Training procedure
127
 
 
171
  | 0.5607 | 2.8879 | 56000 | 1.0263 | 0.6539 | 0.5060 |
172
  | 0.5582 | 2.9911 | 58000 | 1.0269 | 0.6593 | 0.5103 |
173
 
174
+ ### Training Analysis
175
+ The training shows excellent convergence with the model reaching its best performance around step 40,000 (epoch 2.06). The larger model capacity allows for faster learning and better final performance compared to the 4.76M variant. The training loss decreases more rapidly while validation metrics show stable improvement, indicating effective use of the increased model capacity without overfitting. The model achieves its peak F1 score of 0.5121 at step 40,000, which was selected as the best checkpoint.
176
 
177
  ### Framework versions
178