TiMauzi commited on
Commit
6d693cb
·
verified ·
1 Parent(s): 9ed5193

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +94 -11
README.md CHANGED
@@ -2,38 +2,119 @@
2
  library_name: transformers
3
  tags:
4
  - generated_from_trainer
 
 
 
 
 
5
  datasets:
6
- - generator
7
  metrics:
8
  - accuracy
9
  - f1
10
  model-index:
11
- - name: EraClassifierBiLSTM
12
  results: []
13
  ---
14
 
15
- <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
- should probably proofread and complete it, then remove this comment. -->
17
 
18
- # EraClassifierBiLSTM
19
-
20
- This model is a fine-tuned version of [](https://huggingface.co/) on the generator dataset.
21
- It achieves the following results on the evaluation set:
22
  - Loss: 1.0935
23
  - Accuracy: 0.5852
24
  - F1: 0.4299
25
 
26
  ## Model description
27
 
28
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
29
 
30
  ## Intended uses & limitations
31
 
32
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
 
34
  ## Training and evaluation data
35
 
36
- More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
37
 
38
  ## Training procedure
39
 
@@ -102,6 +183,8 @@ The following hyperparameters were used during training:
102
  | 0.8316 | 4.8476 | 94000 | 1.1055 | 0.5736 | 0.4174 |
103
  | 0.8264 | 4.9508 | 96000 | 1.1056 | 0.5736 | 0.4174 |
104
 
 
 
105
 
106
  ### Framework versions
107
 
 
2
  library_name: transformers
3
  tags:
4
  - generated_from_trainer
5
+ - midi
6
+ - music
7
+ - era-classification
8
+ - bilstm
9
+ - audio-analysis
10
  datasets:
11
+ - TiMauzi/imslp-midi-by-sa
12
  metrics:
13
  - accuracy
14
  - f1
15
  model-index:
16
+ - name: EraClassifierBiLSTM-4.76M
17
  results: []
18
  ---
19
 
20
+ # EraClassifierBiLSTM-4.76M
 
21
 
22
+ This model is a compact bidirectional LSTM neural network designed for musical era classification from MIDI data. It achieves the following results on the evaluation set:
 
 
 
23
  - Loss: 1.0935
24
  - Accuracy: 0.5852
25
  - F1: 0.4299
26
 
27
  ## Model description
28
 
29
+ The EraClassifierBiLSTM-4.76M is a custom bidirectional LSTM neural network specifically designed for classifying musical compositions into historical eras based on MIDI data analysis. This compact model variant (~4.76M parameters) offers a good balance between performance and computational efficiency.
30
+
31
+ ### Architecture
32
+ - **Model Type**: Custom Bidirectional LSTM (BiLSTM)
33
+ - **Input**: Sequences of 8-dimensional feature vectors extracted from MIDI messages
34
+ - **Window Size**: 24 MIDI messages per sequence with stride=20 (overlapping windows)
35
+ - **Hidden Layers**: 2 bidirectional LSTM layers with 384 hidden units each
36
+ - **Output**: 6-class classification (musical eras)
37
+ - **Activation**: LeakyReLU with dropout for regularization
38
+ - **Loss Function**: CrossEntropyLoss
39
+
40
+ ### Feature Engineering
41
+ The model processes 8 key MIDI features per message, automatically selected as the most frequent features across the dataset:
42
+
43
+ **Numerical Features (7):**
44
+ - **channel**: MIDI channel number (μ=2.01, σ=2.74)
45
+ - **control**: Control change values (μ=11.90, σ=17.02)
46
+ - **note**: Note pitch/midi note number (μ=64.17, σ=12.00)
47
+ - **tempo**: Tempo in microseconds per beat (μ=738221.63, σ=460369.34)
48
+ - **time**: Timing information in ticks (μ=714.28, σ=1337451.38)
49
+ - **value**: Generic value field (μ=83.91, σ=26.72)
50
+ - **velocity**: Note velocity/intensity (μ=42.80, σ=44.24)
51
+
52
+ **Categorical Features (1):**
53
+ - **type**: MIDI message type (mapped to numerical IDs)
54
+
55
+ All numerical features are normalized using dataset statistics (mean and standard deviation), while categorical features are encoded using learned ID mappings.
56
+
57
+ ### Training Approach
58
+ The model uses a sliding window approach to capture temporal patterns in musical structure that are characteristic of different historical periods. Each MIDI file is processed into multiple overlapping sequences, allowing the model to learn both local and global musical patterns.
59
 
60
  ## Intended uses & limitations
61
 
62
+ ### Intended Uses
63
+ - **Musicological Research**: Analyzing historical trends in musical composition
64
+ - **Educational Tools**: Teaching music history through automated era identification
65
+ - **Digital Music Libraries**: Automatic categorization and organization of MIDI collections
66
+ - **Music Analysis**: Understanding stylistic characteristics across different periods
67
+ - **Content Recommendation**: Suggesting music from similar historical periods
68
+
69
+ ### Limitations
70
+ - **Performance Variability**: The model shows significant performance differences across eras:
71
+ - Strong performance on Romantic (82.6%) and Baroque (66.6%) eras
72
+ - Moderate performance on Renaissance (45.4%) and Modern (37.0%) eras
73
+ - Poor performance on Classical (12.5%) and Other (14.2%) categories
74
+ - **Era Confusion**: Adjacent historical periods are frequently confused:
75
+ - Renaissance music often misclassified as Baroque (36.7%)
76
+ - Classical music heavily confused with Baroque (37.7%) and Romantic (34.1%)
77
+ - Modern music often misclassified as Romantic (35.9%)
78
+ - **Data Dependencies**: Performance depends on the quality and representativeness of the training data
79
+ - **MIDI-Only**: Limited to MIDI format; cannot process audio recordings or sheet music
80
+ - **Cultural Bias**: Training data may reflect Western classical music traditions
81
+
82
+ ### Recommendations for Use
83
+ - Validate results with musicological expertise, especially for Classical period identification
84
+ - Use confidence thresholds to filter low-confidence predictions
85
 
86
  ## Training and evaluation data
87
 
88
+ ### Dataset
89
+ - **Source**: TiMauzi/imslp-midi-by-sa (International Music Score Library Project)
90
+ - **Format**: MIDI files with associated metadata including composition year and era
91
+ - **Preprocessing**: MIDI messages converted to 8-dimensional feature vectors
92
+ - **Window Strategy**: 24-message windows with 20-message stride for overlapping sequences
93
+
94
+ ### Musical Eras Covered
95
+ 1. **Renaissance** (1400-1600): Early polyphonic music, madrigals, motets
96
+ 2. **Baroque** (1600-1750): Ornamented music, basso continuo, fugues
97
+ 3. **Classical** (1750-1820): Clear forms, balanced phrases, sonata form
98
+ 4. **Romantic** (1820-1900): Expressive, emotional, expanded forms
99
+ 5. **Modern** (1900-present): Atonal, experimental, diverse styles
100
+ 6. **Other**: Miscellaneous or unclear period classifications
101
+
102
+ ### Data Distribution
103
+ The model was trained on 6,992 MIDI files from the IMSLP dataset with the following era distribution:
104
+ - **Romantic**: 2,722 samples (38.9%) - median year 1854
105
+ - **Baroque**: 1,874 samples (26.8%) - median year 1710
106
+ - **Renaissance**: 843 samples (12.1%) - median year 1611
107
+ - **Modern**: 763 samples (10.9%) - median year 2020
108
+ - **Classical**: 597 samples (8.5%) - median year 1779
109
+ - **Other**: 193 samples (2.8%) - median year 1909 (includes Early 20th century and Medieval)
110
+
111
+ Era thresholding was applied (minimum 150 samples per era), with rare eras like "Early 20th century" (125 samples) and "Medieval" (5 samples) mapped to the "Other" category to maintain classification stability.
112
+
113
+ ### Evaluation Strategy
114
+ - **Validation**: Performance measured on held-out validation set
115
+ - **Test Set**: Final evaluation on completely unseen test data
116
+ - **Metrics**: Accuracy, F1-score (macro-averaged), and confusion matrix analysis
117
+ - **Training Duration**: 5 epochs (~96,000 training steps) with fallback to best result (early stopping) based on F1 score
118
 
119
  ## Training procedure
120
 
 
183
  | 0.8316 | 4.8476 | 94000 | 1.1055 | 0.5736 | 0.4174 |
184
  | 0.8264 | 4.9508 | 96000 | 1.1056 | 0.5736 | 0.4174 |
185
 
186
+ ### Training Analysis
187
+ The training shows stable convergence with the model reaching its best performance around step 44,000 (epoch 2.27). The training loss decreases steadily while validation metrics stabilize, indicating good generalization without severe overfitting. The model achieves its peak F1 score of 0.4299 at step 44,000, which was selected as the best checkpoint.
188
 
189
  ### Framework versions
190