anggars commited on
Commit
55fc11a
·
verified ·
1 Parent(s): 1a72ec3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -50
README.md CHANGED
@@ -1,80 +1,64 @@
1
  ---
2
- language:
3
- - en
4
- - id
5
- license: mit
6
  tags:
7
  - multimodal
8
- - audio-classification
9
- - text-classification
10
  - math-rock
11
- - midwest-emo
12
- - emotion-recognition
13
- - personality-profiling
14
  - pytorch
 
 
 
 
15
  metrics:
16
- - f1
17
- - precision
18
- - recall
19
  - accuracy
20
  ---
21
 
22
  # Neural Mathrock: Multimodal Emotion and Personality Analysis
23
 
24
- This repository hosts a multimodal deep learning framework specialized in the affective and psychological analysis of **Math Rock** and **Midwest Emo** music. By integrating lyrical semantics and acoustic patterns, the system provides a comprehensive profile of a track's emotional and personality-based characteristics.
25
 
26
  ## Project Objectives
27
-
28
  The research is structured to prioritize emotional resonance and genre-specific complexities:
29
-
30
- 1. **Emotion Recognition:** Identifying affective states (e.g., Sadness, Tension, Joy) through the synergy of vocal delivery and lyrical themes.
31
- 2. **Personality (MBTI) Profiling:** Correlating complex musical arrangements and introspective lyrics with personality archetypes (e.g., INFP, INTJ, INTP).
32
- 3. **Acoustic Feature Extraction:** Analyzing technical attributes of Math Rock, including non-standard time signatures, syncopation, and clean guitar timbres.
33
 
34
  ## Technical Architecture: Late Fusion Multimodal
35
-
36
- The system utilizes a **Late Fusion** approach to process distinct data modalities:
37
 
38
  ### 1. Lyrical Stream (NLP)
39
  - **Encoder:** `xlm-roberta-base`
40
- - **Logic:** Extracts high-level semantic embeddings from song lyrics. The encoder is frozen to maintain stable pre-trained representations given the specialized nature of the dataset.
41
 
42
  ### 2. Acoustic Stream (DSP)
43
- - **Model:** 1D-Convolutional Neural Network (CNN)
44
- - **Input:** 20-channel Mel-frequency cepstral coefficients (MFCC).
45
- - **Logic:** Captures the "twinkly" guitar textures and erratic drum patterns common in Midwest Emo and Math Rock.
46
 
47
  ### 3. Fusion Layer
48
- - **Method:** Feature concatenation (768-dim Text + 256-dim Audio).
49
- - **Heads:** Multi-task fully connected layers for joint Emotion and Personality classification.
50
-
51
- ## Performance Summary (Weighted Evaluation)
52
 
53
- The model was optimized using **Weighted Cross-Entropy Loss** to mitigate significant class imbalances.
 
54
 
55
- | Metric | Score |
56
- | :--- | :--- |
57
- | **Accuracy** | 0.81 |
58
- | **Weighted Avg F1** | 0.76 |
59
- | **Macro Avg F1** | 0.45 |
60
-
61
- ### Class Highlights:
62
- - **INTJ:** 1.00 F1-score (Highly distinctive acoustic-lyrical signatures).
63
- - **INFP:** 0.89 F1-score (Robust detection of the genre's majority archetype).
64
- - **ISTP:** 0.53 F1-score (Successful identification of niche instrumental patterns).
65
-
66
- ## Dataset
67
-
68
- Trained on the [anggars/neural-mathrock](https://huggingface.co/datasets/anggars/neural-mathrock) dataset, containing specialized annotations for emotion, MBTI, and musical features.
69
-
70
- ## Academic Context
71
-
72
- This project is an undergraduate thesis developed at **Sekolah Tinggi Teknologi Cipasung (STTC)**, Informatics Department, Class of 2022. It explores the intersection of Music Information Retrieval (MIR) and psychological profiling.
73
 
74
  ## How to Use
 
75
 
76
  ```python
77
  import torch
78
- # Example:
79
- # model.load_state_dict(torch.load("pytorch_model.bin"))
80
- # model.eval()
 
 
 
 
 
 
 
 
 
1
  ---
2
+ library_name: pytorch
 
 
 
3
  tags:
4
  - multimodal
5
+ - music-classification
 
6
  - math-rock
 
 
 
7
  - pytorch
8
+ datasets:
9
+ - anggars/neural-mathrock
10
+ language:
11
+ - en
12
  metrics:
 
 
 
13
  - accuracy
14
  ---
15
 
16
  # Neural Mathrock: Multimodal Emotion and Personality Analysis
17
 
18
+ This repository hosts a multimodal deep learning framework specialized in the affective and psychological analysis of Math Rock and Midwest Emo music. By integrating lyrical semantics and acoustic patterns, the system extracts features to classify emotional and personality-based characteristics.
19
 
20
  ## Project Objectives
 
21
  The research is structured to prioritize emotional resonance and genre-specific complexities:
22
+ - **Emotion & Vibe Recognition:** Identifying affective states and general vibes (e.g., Melancholic, Aggressive) through the synergy of lyrical themes and audio arrays.
23
+ - **Personality (MBTI) Profiling:** Correlating complex musical arrangements and introspective lyrics with personality archetypes.
24
+ - **Acoustic Feature Extraction:** Analyzing technical attributes like syncopation and odd time signatures using robust audio signal processing.
 
25
 
26
  ## Technical Architecture: Late Fusion Multimodal
27
+ The system utilizes a custom Multimodal PyTorch architecture:
 
28
 
29
  ### 1. Lyrical Stream (NLP)
30
  - **Encoder:** `xlm-roberta-base`
31
+ - **Logic:** Extracts high-level semantic embeddings from song lyrics (768-dim). The base model weights are frozen to maintain stable pre-trained representations.
32
 
33
  ### 2. Acoustic Stream (DSP)
34
+ - **Model:** 2D-Convolutional Neural Network (CNN) followed by a Transformer Encoder.
35
+ - **Input:** 3-channel stacked features (Mel-spectrogram, 13-coefficient MFCC, and Chroma).
36
+ - **Logic:** Captures the complex guitar textures and erratic drum patterns common in the genre, pooling them into a 256-dim feature vector.
37
 
38
  ### 3. Fusion Layer
39
+ - **Method:** Feature concatenation (768-dim Text + 256-dim Audio) into a 1024-dim representation, processed through a 512-dim feed-forward layer with dropout regularization.
40
+ - **Heads:** Multi-task fully connected layers for joint classification of MBTI, Emotion, Vibe, Intensity, and Tempo.
 
 
41
 
42
+ ## Performance Summary and Academic Context
43
+ This project is an undergraduate thesis developed at Sekolah Tinggi Teknologi Cipasung (STTC), Informatics Department, Class of 2022. It explores the extreme class imbalances naturally found in niche music genres.
44
 
45
+ **Key Findings:**
46
+ - **High-Performance Metrics:** The architecture successfully identifies broad acoustic targets, achieving an **F1-Score of 0.77 for High Intensity** and **0.72 for Melancholic Vibe**, proving the viability of the multimodal feature extraction pipeline.
47
+ - **Class Imbalance in Niche Genres:** For 16-class targets like MBTI and Emotion, the model highlights the natural bias of the dataset. Classes with sufficient samples (e.g., ESTJ, Desire, Pride) perform adequately, while minority classes struggle due to lack of representation (Macro F1 ~0.07). This serves as a realistic baseline for future research regarding data sparsity in highly subjective Music Information Retrieval (MIR) tasks.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
48
 
49
  ## How to Use
50
+ Since this is a custom PyTorch architecture, it cannot be loaded via the standard `pipeline` API. You must define the model classes locally and load the state dictionary.
51
 
52
  ```python
53
  import torch
54
+
55
+ # 1. Define the MultimodalMathRock architecture classes here first
56
+ # (Refer to the training script for the AudioCNNTransformer, TextTransformer, and Fusion module)
57
+
58
+ # 2. Load the weights
59
+ model = MultimodalMathRock()
60
+ state_dict = torch.load("model.pt", map_location="cpu")
61
+ model.load_state_dict(state_dict['model_state'])
62
+ model.eval()
63
+
64
+ # Model is ready for inference