RayyanAhmed9477 commited on
Commit
2638e90
·
verified ·
1 Parent(s): c58f471

Upload speech emotion classification model with multi-modal architecture

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ cnn_emotion_model_20251022_065208_part1.keras filter=lfs diff=lfs merge=lfs -text
37
+ cnn_emotion_model_20251022_065208_part2.keras filter=lfs diff=lfs merge=lfs -text
38
+ cnn_emotion_model_20251022_065208_part3.keras filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,296 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: mit
5
+ library_name: tensorflow
6
+ tags:
7
+ - audio
8
+ - speech
9
+ - emotion-recognition
10
+ - deep-learning
11
+ - classification
12
+ datasets:
13
+ - ravdess
14
+ metrics:
15
+ - accuracy
16
+ - precision
17
+ - recall
18
+ - f1
19
+ model-index:
20
+ - name: Speech Emotion Classification
21
+ results:
22
+ - task:
23
+ name: Audio Classification
24
+ type: audio-classification
25
+ dataset:
26
+ name: RAVDESS
27
+ type: ravdess
28
+ metrics:
29
+ - name: Accuracy
30
+ type: accuracy
31
+ value: 0.4213
32
+ - name: Precision (weighted)
33
+ type: precision
34
+ value: 0.7253
35
+ - name: Recall (weighted)
36
+ type: recall
37
+ value: 0.4213
38
+ - name: F1-Score (weighted)
39
+ type: f1
40
+ value: 0.4090
41
+ ---
42
+
43
+ # Speech Emotion Classification
44
+
45
+ <div align="center">
46
+
47
+ [![Python](https://img.shields.io/badge/Python-3.7%2B-blue)](https://www.python.org/downloads/)
48
+ [![TensorFlow](https://img.shields.io/badge/TensorFlow-2.0%2B-orange)](https://www.tensorflow.org/)
49
+ [![License](https://img.shields.io/badge/License-MIT-green)](LICENSE)
50
+ [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97-Hugging%20Face-yellow)](https://huggingface.co)
51
+
52
+ **Detect emotions from speech using advanced deep learning models**
53
+
54
+ </div>
55
+
56
+ ---
57
+
58
+ ## 🎯 Overview
59
+
60
+ This repository contains a sophisticated deep learning model for speech emotion classification. The model is designed to detect and classify emotions from audio recordings with high accuracy using advanced neural network architectures. It combines acoustic features from both Mel-frequency cepstral coefficients (MFCCs) and mel-spectrograms to analyze emotional content in speech.
61
+
62
+ ## 🌟 Key Features
63
+
64
+ - **Multi-modal Architecture**: Combines CNN and MLP branches for comprehensive feature analysis
65
+ - **Real-time Processing**: Capable of processing and analyzing speech in real-time
66
+ - **High Accuracy**: State-of-the-art performance on emotion classification tasks
67
+ - **Cross-platform Compatibility**: Runs seamlessly on Windows, macOS, and Linux
68
+ - **Hugging Face Integration**: Easy model sharing and deployment via Hugging Face Hub
69
+
70
+ ## 📊 Dataset
71
+
72
+ The model was trained on the **RAVDESS** (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset, which contains high-quality recordings of professional actors expressing different emotions. The dataset includes 8 distinct emotions:
73
+
74
+ - 😌 **Neutral**: Emotionless speech
75
+ - 😌 **Calm**: Calm and relaxed emotion
76
+ - 😊 **Happy**: Joyful and cheerful emotion
77
+ - 😢 **Sad**: Melancholic and sorrowful emotion
78
+ - 😡 **Angry**: Irritated and mad emotion
79
+ - 😱 **Fearful**: Scared and apprehensive emotion
80
+ - 😤 **Disgust**: Revolted and repulsed emotion
81
+ - 😮 **Surprised**: Astonished and amazed emotion
82
+
83
+ ## 📈 Performance Metrics
84
+
85
+ | Metric | Value |
86
+ |--------|-------|
87
+ | **Test Accuracy** | ~42.13% |
88
+ | **Precision (weighted)** | ~72.53% |
89
+ | **Recall (weighted)** | ~42.13% |
90
+ | **F1-Score (weighted)** | ~40.90% |
91
+
92
+ ## 🛠️ Installation
93
+
94
+ ### Prerequisites
95
+
96
+ - Python 3.7 or higher
97
+ - pip package manager
98
+
99
+ ### Setup
100
+
101
+ 1. Clone the repository:
102
+ ```bash
103
+ git clone https://github.com/your-username/speech_emotion_classification.git
104
+ cd speech_emotion_classification
105
+ ```
106
+
107
+ 2. Create a virtual environment (recommended):
108
+ ```bash
109
+ python -m venv venv
110
+ source venv/bin/activate # On Windows: venv\Scripts\activate
111
+ ```
112
+
113
+ 3. Install the required dependencies:
114
+ ```bash
115
+ pip install -r requirements.txt
116
+ ```
117
+
118
+ Or install the dependencies manually:
119
+ ```bash
120
+ pip install tensorflow numpy librosa scikit-learn huggingface_hub pandas matplotlib seaborn
121
+ ```
122
+
123
+ ## 🚀 Usage
124
+
125
+ ### 1. Load and Use the Model
126
+
127
+ ```python
128
+ import librosa
129
+ import numpy as np
130
+ from tensorflow import keras
131
+
132
+ # Load the pre-trained model
133
+ model = keras.models.load_model('./path/to/model.keras')
134
+
135
+ # Load an audio file
136
+ audio_path = 'path/to/audio.wav'
137
+ y, sr = librosa.load(audio_path, sr=None)
138
+
139
+ # Extract features
140
+ mfcc_features = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
141
+ spectrogram_features = librosa.feature.melspectrogram(y=y, sr=sr)
142
+
143
+ # Normalize and reshape features according to your preprocessing pipeline
144
+ # (Implementation depends on how the model was trained)
145
+
146
+ # Make prediction
147
+ # For multi-modal models, pass both feature arrays: [mfcc_features_reshaped, spec_features_reshaped]
148
+ predictions = model.predict([mfcc_features_reshaped, spec_features_reshaped])
149
+
150
+ # Get emotion with highest probability
151
+ emotion_labels = ['neutral', 'calm', 'happy', 'sad', 'angry', 'fearful', 'disgust', 'surprised']
152
+ predicted_emotion = emotion_labels[np.argmax(predictions)]
153
+
154
+ print(f"Predicted emotion: {predicted_emotion}")
155
+ ```
156
+
157
+ ### 2. Train Your Own Model
158
+
159
+ ```bash
160
+ python auto_train.py
161
+ ```
162
+
163
+ ### 3. Test the Model
164
+
165
+ ```bash
166
+ python test_prediction_pipeline.py
167
+ ```
168
+
169
+ ## 🏗️ Architecture
170
+
171
+ The model uses a sophisticated multi-modal architecture:
172
+
173
+ 1. **MFCC Branch**: Processes Mel-frequency cepstral coefficients using dense neural network layers
174
+ 2. **Spectrogram Branch**: Processes mel-spectrogram features using convolutional layers
175
+ 3. **Fusion Layer**: Combines both feature representations before final classification
176
+ 4. **Output Layer**: Softmax layer for emotion classification across 8 emotional states
177
+
178
+ ## 📁 Project Structure
179
+
180
+ ```
181
+ speech_emotion_classification/
182
+ ├── app.py # Streamlit web application
183
+ ├── auto_train.py # Automated training script
184
+ ├── debug_labels.py # Label debugging utilities
185
+ ├── driver.py # Main execution script
186
+ ├── push_to_hub.py # Hugging Face model upload script
187
+ ├── split_model.py # Model splitting utilities
188
+ ├── test_*.py # Test files
189
+ ├── requirements.txt # Project dependencies
190
+ ├── README.md # This file
191
+ └── ...
192
+ ```
193
+
194
+ ## 🧪 Evaluation
195
+
196
+ To evaluate the model on custom audio files:
197
+
198
+ ```bash
199
+ python test_prediction_pipeline.py
200
+ ```
201
+
202
+ This will run the model on the test dataset and provide detailed performance metrics.
203
+
204
+ ## 🤗 Hugging Face Integration
205
+
206
+ The model can be easily shared and deployed using Hugging Face Hub:
207
+
208
+ ```bash
209
+ python push_to_hub.py
210
+ ```
211
+
212
+ ## 🚧 Limitations
213
+
214
+ - Performance may vary with different accents and languages
215
+ - Audio quality (noise, clarity) can significantly affect accuracy
216
+ - Emotions expressed in speech can be culturally dependent
217
+ - Requires clear audio with minimal background noise for best results
218
+ - Shorter audio clips (5-10 seconds) typically work better than longer recordings
219
+
220
+ ## 🛡️ Ethical Considerations
221
+
222
+ - This model should not be used to make critical decisions about individuals without their explicit consent
223
+ - Results should be interpreted with caution and not treated as definitive psychological assessments
224
+ - Consider privacy implications when processing audio of individuals
225
+ - Use responsibly and ethically, with appropriate consent when analyzing personal speech
226
+ - Be aware of potential bias in the training data and its impact on model predictions
227
+
228
+ ## 🧪 Reproducibility
229
+
230
+ To ensure reproducible results:
231
+
232
+ 1. Set random seeds:
233
+ ```python
234
+ import numpy as np
235
+ import tensorflow as tf
236
+ import random
237
+
238
+ np.random.seed(42)
239
+ tf.random.set_seed(42)
240
+ random.seed(42)
241
+ ```
242
+
243
+ 2. Use the same training data and preprocessing pipeline
244
+
245
+ ## 🤝 Contributing
246
+
247
+ Contributions are welcome! Here's how you can contribute:
248
+
249
+ 1. Fork the repository
250
+ 2. Create a feature branch (`git checkout -b feature/amazing-feature`)
251
+ 3. Commit your changes (`git commit -m 'Add some amazing feature'`)
252
+ 4. Push to the branch (`git push origin feature/amazing-feature`)
253
+ 5. Open a Pull Request
254
+
255
+ Please make sure to update tests as appropriate and follow the existing code style.
256
+
257
+ ### Development Setup
258
+
259
+ ```bash
260
+ git clone https://github.com/your-username/speech_emotion_classification.git
261
+ cd speech_emotion_classification
262
+ pip install -r requirements.txt
263
+ pip install -r requirements-dev.txt # For development dependencies
264
+ ```
265
+
266
+ ## 📄 License
267
+
268
+ This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
269
+
270
+ ## 📚 Citation
271
+
272
+ If you use this model in your research, please cite:
273
+
274
+ ```bibtex
275
+ @software{speech_emotion_classification,
276
+ author = {AI Research Team},
277
+ title = {Speech Emotion Classification Model},
278
+ year = {2025},
279
+ url = {https://github.com/your-username/speech_emotion_classification}
280
+ }
281
+ ```
282
+
283
+ ## 🆘 Support
284
+
285
+ If you have any questions or encounter issues:
286
+
287
+ 1. Check the [Issues](https://github.com/your-username/speech_emotion_classification/issues) page
288
+ 2. Open a new issue if your problem hasn't been addressed
289
+ 3. For feature requests, please open an issue with the "enhancement" tag
290
+
291
+ ## 🙏 Acknowledgments
292
+
293
+ - The RAVDESS dataset creators for providing the high-quality emotional speech data
294
+ - The TensorFlow team for providing an excellent deep learning framework
295
+ - The Librosa team for audio processing capabilities
296
+ - The Hugging Face team for model sharing capabilities
cnn_emotion_model_20251022_065208_architecture.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"module": "keras.src.models.functional", "class_name": "Functional", "config": {"name": "functional", "trainable": true, "layers": [{"module": "keras.layers", "class_name": "InputLayer", "config": {"batch_shape": [null, 128, 165, 1], "dtype": "float32", "sparse": false, "ragged": false, "name": "spec_input"}, "registered_name": null, "name": "spec_input", "inbound_nodes": []}, {"module": "keras.layers", "class_name": "Conv2D", "config": {"name": "conv2d", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "filters": 32, "kernel_size": [3, 3], "strides": [1, 1], "padding": "same", "data_format": "channels_last", "dilation_rate": [1, 1], "groups": 1, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 128, 165, 1]}, "name": "conv2d", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128, 165, 1], "dtype": "float32", "keras_history": ["spec_input", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_2", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 128, 165, 32]}, "name": "batch_normalization_2", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128, 165, 32], "dtype": "float32", "keras_history": ["conv2d", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "MaxPooling2D", "config": {"name": "max_pooling2d", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "pool_size": [2, 2], "padding": "valid", "strides": [2, 2], "data_format": "channels_last"}, "registered_name": null, "name": "max_pooling2d", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128, 165, 32], "dtype": "float32", "keras_history": ["batch_normalization_2", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_2", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_2", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 64, 82, 32], "dtype": "float32", "keras_history": ["max_pooling2d", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "Conv2D", "config": {"name": "conv2d_1", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "filters": 64, "kernel_size": [3, 3], "strides": [1, 1], "padding": "same", "data_format": "channels_last", "dilation_rate": [1, 1], "groups": 1, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 64, 82, 32]}, "name": "conv2d_1", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 64, 82, 32], "dtype": "float32", "keras_history": ["dropout_2", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_3", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 64, 82, 64]}, "name": "batch_normalization_3", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 64, 82, 64], "dtype": "float32", "keras_history": ["conv2d_1", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "InputLayer", "config": {"batch_shape": [null, 13], "dtype": "float32", "sparse": false, "ragged": false, "name": "mfcc_input"}, "registered_name": null, "name": "mfcc_input", "inbound_nodes": []}, {"module": "keras.layers", "class_name": "MaxPooling2D", "config": {"name": "max_pooling2d_1", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "pool_size": [2, 2], "padding": "valid", "strides": [2, 2], "data_format": "channels_last"}, "registered_name": null, "name": "max_pooling2d_1", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 64, 82, 64], "dtype": "float32", "keras_history": ["batch_normalization_3", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "units": 256, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 13]}, "name": "dense", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 13], "dtype": "float32", "keras_history": ["mfcc_input", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_3", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_3", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 32, 41, 64], "dtype": "float32", "keras_history": ["max_pooling2d_1", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 256]}, "name": "batch_normalization", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["dense", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "Conv2D", "config": {"name": "conv2d_2", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "filters": 128, "kernel_size": [3, 3], "strides": [1, 1], "padding": "same", "data_format": "channels_last", "dilation_rate": [1, 1], "groups": 1, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 32, 41, 64]}, "name": "conv2d_2", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 32, 41, 64], "dtype": "float32", "keras_history": ["dropout_3", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["batch_normalization", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_4", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 32, 41, 128]}, "name": "batch_normalization_4", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 32, 41, 128], "dtype": "float32", "keras_history": ["conv2d_2", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense_1", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "units": 128, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 256]}, "name": "dense_1", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["dropout", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "MaxPooling2D", "config": {"name": "max_pooling2d_2", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "pool_size": [2, 2], "padding": "valid", "strides": [2, 2], "data_format": "channels_last"}, "registered_name": null, "name": "max_pooling2d_2", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 32, 41, 128], "dtype": "float32", "keras_history": ["batch_normalization_4", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_1", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 128]}, "name": "batch_normalization_1", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["dense_1", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_4", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_4", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 16, 20, 128], "dtype": "float32", "keras_history": ["max_pooling2d_2", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_1", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_1", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["batch_normalization_1", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "Flatten", "config": {"name": "flatten", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "data_format": "channels_last"}, "registered_name": null, "build_config": {"input_shape": [null, 16, 20, 128]}, "name": "flatten", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 16, 20, 128], "dtype": "float32", "keras_history": ["dropout_4", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Concatenate", "config": {"name": "fusion_concat", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1}, "registered_name": null, "build_config": {"input_shape": [[null, 128], [null, 40960]]}, "name": "fusion_concat", "inbound_nodes": [{"args": [[{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["dropout_1", 0, 0]}}, {"class_name": "__keras_tensor__", "config": {"shape": [null, 40960], "dtype": "float32", "keras_history": ["flatten", 0, 0]}}]], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense_2", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "units": 256, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 41088]}, "name": "dense_2", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 41088], "dtype": "float32", "keras_history": ["fusion_concat", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_5", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 256]}, "name": "batch_normalization_5", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["dense_2", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_5", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_5", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["batch_normalization_5", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense_3", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "units": 128, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 256]}, "name": "dense_3", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["dropout_5", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_6", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 128]}, "name": "batch_normalization_6", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["dense_3", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_6", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_6", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["batch_normalization_6", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense_4", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "units": 8, "activation": "softmax", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": null, "bias_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 128]}, "name": "dense_4", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["dropout_6", 0, 0]}}], "kwargs": {}}]}], "input_layers": [["mfcc_input", 0, 0], ["spec_input", 0, 0]], "output_layers": [["dense_4", 0, 0]]}, "registered_name": "Functional", "build_config": {"input_shape": null}, "compile_config": {"loss": "sparse_categorical_crossentropy", "loss_weights": null, "metrics": ["accuracy"], "weighted_metrics": null, "run_eagerly": false, "steps_per_execution": 1, "jit_compile": false}}
cnn_emotion_model_20251022_065208_feature_info.json ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "feature_type": "multimodal",
3
+ "config": {
4
+ "mfcc": {
5
+ "enabled": true,
6
+ "parameters": {
7
+ "n_mels": 128,
8
+ "fmax": 8000,
9
+ "power": 2.0,
10
+ "n_mfcc": 40,
11
+ "n_fft": 2048,
12
+ "hop_length": 512
13
+ }
14
+ },
15
+ "mel_spectrogram": {
16
+ "enabled": true,
17
+ "parameters": {
18
+ "n_mels": 128,
19
+ "fmax": 8000,
20
+ "power": 2.0,
21
+ "n_mfcc": 40,
22
+ "n_fft": 2048,
23
+ "hop_length": 512
24
+ }
25
+ }
26
+ },
27
+ "normalization_params": {
28
+ "mfcc_scaler": {
29
+ "mean": [
30
+ -573.8881925855364,
31
+ 42.58236339167943,
32
+ -5.421968076008534,
33
+ 8.838157428074664,
34
+ -4.626548335373786,
35
+ -4.561592391416308,
36
+ -10.38183564253776,
37
+ -8.013113031831121,
38
+ -3.6677634406732977,
39
+ -2.2170092025531516,
40
+ -4.698173174810411,
41
+ -0.521445854008937,
42
+ -2.5761164628238475
43
+ ],
44
+ "scale": [
45
+ 101.40211169597691,
46
+ 15.915899940828131,
47
+ 13.570870655407589,
48
+ 8.599923731484084,
49
+ 9.14738173651626,
50
+ 6.565023895647934,
51
+ 6.508280879081033,
52
+ 4.970972619842886,
53
+ 4.785011824491553,
54
+ 4.724790727787786,
55
+ 4.41035342173799,
56
+ 4.21637411935603,
57
+ 4.011386501573868
58
+ ],
59
+ "var": [
60
+ 10282.38825640338,
61
+ 253.31587092645293,
62
+ 184.16853034580282,
63
+ 73.95868818734314,
64
+ 83.67459263355123,
65
+ 43.099538750428366,
66
+ 42.35772000101178,
67
+ 24.71056878722765,
68
+ 22.89633816052398,
69
+ 22.323647421389435,
70
+ 19.451217304635996,
71
+ 17.777810714375338,
72
+ 16.091221665009034
73
+ ]
74
+ },
75
+ "spec_scaler": {
76
+ "mean": [
77
+ -43.60601707329218
78
+ ],
79
+ "scale": [
80
+ 32.47546967488067
81
+ ],
82
+ "var": [
83
+ 1054.656130604094
84
+ ]
85
+ }
86
+ }
87
+ }
cnn_emotion_model_20251022_065208_manifest.txt ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ Original file: cnn_emotion_model_20251022_065208.keras
2
+ Split into 3 parts:
3
+ - cnn_emotion_model_20251022_065208_part1.keras
4
+ - cnn_emotion_model_20251022_065208_part2.keras
5
+ - cnn_emotion_model_20251022_065208_part3.keras
cnn_emotion_model_20251022_065208_part1.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:52ff89869d45bd9ecfc2ebd26cfc4beffe4b36c59903a7e798e7e5b98d80b5a8
3
+ size 52428800
cnn_emotion_model_20251022_065208_part1.keras ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:48154f8a37befc72dab66c817789f85440b566e26c9c775a695dab5134efc208
3
+ size 52428800
cnn_emotion_model_20251022_065208_part2.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bb6078528f1c8569f38bf62bfd4c1c9a8d4be874d0f3e8e93dfad087fc606940
3
+ size 52428800
cnn_emotion_model_20251022_065208_part2.keras ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1b1b2f0779608cca6875461829e60bf33db53d53ee7fdff40e92e544fc0e722f
3
+ size 52428800
cnn_emotion_model_20251022_065208_part3.h5 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f5b3b2255c659c0e59c123f75e2d30b2a8a05e3bcfc8fd7b133dfa4a4f876fe2
3
+ size 23493312
cnn_emotion_model_20251022_065208_part3.keras ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d6bd5a75c2badfe7cd65bb565e82833e5f4e632eda8c0f12ed7bb03c2c063f0f
3
+ size 23485996
requirements.txt ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core libraries
2
+ # TensorFlow version compatible with Python 3.13.9 deployment environment
3
+ tensorflow==2.17.1
4
+ tensorboard>=2.17.0
5
+ numpy>=1.24.0
6
+ pandas>=2.0.0
7
+ scikit-learn>=1.3.0
8
+
9
+ # Audio processing
10
+ librosa>=0.10.0
11
+ soundfile>=0.12.0
12
+
13
+ # Dataset access
14
+ datasets>=2.14.0
15
+ huggingface-hub>=0.17.0
16
+
17
+ # Genetic algorithm optimization
18
+ deap>=1.4.0
19
+
20
+ # Visualization
21
+ matplotlib>=3.7.0
22
+ seaborn>=0.12.0
23
+ plotly>=5.15.0
24
+ kaleido>=0.2.1 # Required for plotly static image export
25
+
26
+ # UI
27
+ streamlit>=1.28.0
28
+ streamlit-extras>=0.4.0
29
+ streamlit-option-menu>=0.3.6
30
+ audio-recorder-streamlit>=0.0.8
31
+
32
+ # Dimensionality reduction for visualization
33
+ umap-learn>=0.5.3
34
+
35
+ # Advanced visualization and reporting
36
+ pydot>=1.4.2 # For model architecture visualization
37
+ graphviz>=0.20 # For model architecture visualization
38
+
39
+ # Utilities
40
+ tqdm>=4.65.0
41
+ h5py>=3.9.0
42
+
43
+ # Utilities
44
+ portalocker>=2.7.0
45
+
46
+ # Additional dependencies for compatibility
47
+ protobuf>=3.20.3,<4.0.0 # Must be compatible with TensorFlow
48
+ packaging>=21.0
49
+ requests>=2.28.0
50
+ python-dateutil>=2.8.0
51
+ pytz>=2022.0
52
+ six>=1.16.0