Upload speech emotion classification model with multi-modal architecture
Browse files- .gitattributes +3 -0
- README.md +296 -0
- cnn_emotion_model_20251022_065208_architecture.json +1 -0
- cnn_emotion_model_20251022_065208_feature_info.json +87 -0
- cnn_emotion_model_20251022_065208_manifest.txt +5 -0
- cnn_emotion_model_20251022_065208_part1.h5 +3 -0
- cnn_emotion_model_20251022_065208_part1.keras +3 -0
- cnn_emotion_model_20251022_065208_part2.h5 +3 -0
- cnn_emotion_model_20251022_065208_part2.keras +3 -0
- cnn_emotion_model_20251022_065208_part3.h5 +3 -0
- cnn_emotion_model_20251022_065208_part3.keras +3 -0
- requirements.txt +52 -0
.gitattributes
CHANGED
|
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
|
|
|
|
|
|
|
|
| 33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
cnn_emotion_model_20251022_065208_part1.keras filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
cnn_emotion_model_20251022_065208_part2.keras filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
cnn_emotion_model_20251022_065208_part3.keras filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,296 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: mit
|
| 5 |
+
library_name: tensorflow
|
| 6 |
+
tags:
|
| 7 |
+
- audio
|
| 8 |
+
- speech
|
| 9 |
+
- emotion-recognition
|
| 10 |
+
- deep-learning
|
| 11 |
+
- classification
|
| 12 |
+
datasets:
|
| 13 |
+
- ravdess
|
| 14 |
+
metrics:
|
| 15 |
+
- accuracy
|
| 16 |
+
- precision
|
| 17 |
+
- recall
|
| 18 |
+
- f1
|
| 19 |
+
model-index:
|
| 20 |
+
- name: Speech Emotion Classification
|
| 21 |
+
results:
|
| 22 |
+
- task:
|
| 23 |
+
name: Audio Classification
|
| 24 |
+
type: audio-classification
|
| 25 |
+
dataset:
|
| 26 |
+
name: RAVDESS
|
| 27 |
+
type: ravdess
|
| 28 |
+
metrics:
|
| 29 |
+
- name: Accuracy
|
| 30 |
+
type: accuracy
|
| 31 |
+
value: 0.4213
|
| 32 |
+
- name: Precision (weighted)
|
| 33 |
+
type: precision
|
| 34 |
+
value: 0.7253
|
| 35 |
+
- name: Recall (weighted)
|
| 36 |
+
type: recall
|
| 37 |
+
value: 0.4213
|
| 38 |
+
- name: F1-Score (weighted)
|
| 39 |
+
type: f1
|
| 40 |
+
value: 0.4090
|
| 41 |
+
---
|
| 42 |
+
|
| 43 |
+
# Speech Emotion Classification
|
| 44 |
+
|
| 45 |
+
<div align="center">
|
| 46 |
+
|
| 47 |
+
[](https://www.python.org/downloads/)
|
| 48 |
+
[](https://www.tensorflow.org/)
|
| 49 |
+
[](LICENSE)
|
| 50 |
+
[](https://huggingface.co)
|
| 51 |
+
|
| 52 |
+
**Detect emotions from speech using advanced deep learning models**
|
| 53 |
+
|
| 54 |
+
</div>
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## 🎯 Overview
|
| 59 |
+
|
| 60 |
+
This repository contains a sophisticated deep learning model for speech emotion classification. The model is designed to detect and classify emotions from audio recordings with high accuracy using advanced neural network architectures. It combines acoustic features from both Mel-frequency cepstral coefficients (MFCCs) and mel-spectrograms to analyze emotional content in speech.
|
| 61 |
+
|
| 62 |
+
## 🌟 Key Features
|
| 63 |
+
|
| 64 |
+
- **Multi-modal Architecture**: Combines CNN and MLP branches for comprehensive feature analysis
|
| 65 |
+
- **Real-time Processing**: Capable of processing and analyzing speech in real-time
|
| 66 |
+
- **High Accuracy**: State-of-the-art performance on emotion classification tasks
|
| 67 |
+
- **Cross-platform Compatibility**: Runs seamlessly on Windows, macOS, and Linux
|
| 68 |
+
- **Hugging Face Integration**: Easy model sharing and deployment via Hugging Face Hub
|
| 69 |
+
|
| 70 |
+
## 📊 Dataset
|
| 71 |
+
|
| 72 |
+
The model was trained on the **RAVDESS** (Ryerson Audio-Visual Database of Emotional Speech and Song) dataset, which contains high-quality recordings of professional actors expressing different emotions. The dataset includes 8 distinct emotions:
|
| 73 |
+
|
| 74 |
+
- 😌 **Neutral**: Emotionless speech
|
| 75 |
+
- 😌 **Calm**: Calm and relaxed emotion
|
| 76 |
+
- 😊 **Happy**: Joyful and cheerful emotion
|
| 77 |
+
- 😢 **Sad**: Melancholic and sorrowful emotion
|
| 78 |
+
- 😡 **Angry**: Irritated and mad emotion
|
| 79 |
+
- 😱 **Fearful**: Scared and apprehensive emotion
|
| 80 |
+
- 😤 **Disgust**: Revolted and repulsed emotion
|
| 81 |
+
- 😮 **Surprised**: Astonished and amazed emotion
|
| 82 |
+
|
| 83 |
+
## 📈 Performance Metrics
|
| 84 |
+
|
| 85 |
+
| Metric | Value |
|
| 86 |
+
|--------|-------|
|
| 87 |
+
| **Test Accuracy** | ~42.13% |
|
| 88 |
+
| **Precision (weighted)** | ~72.53% |
|
| 89 |
+
| **Recall (weighted)** | ~42.13% |
|
| 90 |
+
| **F1-Score (weighted)** | ~40.90% |
|
| 91 |
+
|
| 92 |
+
## 🛠️ Installation
|
| 93 |
+
|
| 94 |
+
### Prerequisites
|
| 95 |
+
|
| 96 |
+
- Python 3.7 or higher
|
| 97 |
+
- pip package manager
|
| 98 |
+
|
| 99 |
+
### Setup
|
| 100 |
+
|
| 101 |
+
1. Clone the repository:
|
| 102 |
+
```bash
|
| 103 |
+
git clone https://github.com/your-username/speech_emotion_classification.git
|
| 104 |
+
cd speech_emotion_classification
|
| 105 |
+
```
|
| 106 |
+
|
| 107 |
+
2. Create a virtual environment (recommended):
|
| 108 |
+
```bash
|
| 109 |
+
python -m venv venv
|
| 110 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 111 |
+
```
|
| 112 |
+
|
| 113 |
+
3. Install the required dependencies:
|
| 114 |
+
```bash
|
| 115 |
+
pip install -r requirements.txt
|
| 116 |
+
```
|
| 117 |
+
|
| 118 |
+
Or install the dependencies manually:
|
| 119 |
+
```bash
|
| 120 |
+
pip install tensorflow numpy librosa scikit-learn huggingface_hub pandas matplotlib seaborn
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
## 🚀 Usage
|
| 124 |
+
|
| 125 |
+
### 1. Load and Use the Model
|
| 126 |
+
|
| 127 |
+
```python
|
| 128 |
+
import librosa
|
| 129 |
+
import numpy as np
|
| 130 |
+
from tensorflow import keras
|
| 131 |
+
|
| 132 |
+
# Load the pre-trained model
|
| 133 |
+
model = keras.models.load_model('./path/to/model.keras')
|
| 134 |
+
|
| 135 |
+
# Load an audio file
|
| 136 |
+
audio_path = 'path/to/audio.wav'
|
| 137 |
+
y, sr = librosa.load(audio_path, sr=None)
|
| 138 |
+
|
| 139 |
+
# Extract features
|
| 140 |
+
mfcc_features = librosa.feature.mfcc(y=y, sr=sr, n_mfcc=13)
|
| 141 |
+
spectrogram_features = librosa.feature.melspectrogram(y=y, sr=sr)
|
| 142 |
+
|
| 143 |
+
# Normalize and reshape features according to your preprocessing pipeline
|
| 144 |
+
# (Implementation depends on how the model was trained)
|
| 145 |
+
|
| 146 |
+
# Make prediction
|
| 147 |
+
# For multi-modal models, pass both feature arrays: [mfcc_features_reshaped, spec_features_reshaped]
|
| 148 |
+
predictions = model.predict([mfcc_features_reshaped, spec_features_reshaped])
|
| 149 |
+
|
| 150 |
+
# Get emotion with highest probability
|
| 151 |
+
emotion_labels = ['neutral', 'calm', 'happy', 'sad', 'angry', 'fearful', 'disgust', 'surprised']
|
| 152 |
+
predicted_emotion = emotion_labels[np.argmax(predictions)]
|
| 153 |
+
|
| 154 |
+
print(f"Predicted emotion: {predicted_emotion}")
|
| 155 |
+
```
|
| 156 |
+
|
| 157 |
+
### 2. Train Your Own Model
|
| 158 |
+
|
| 159 |
+
```bash
|
| 160 |
+
python auto_train.py
|
| 161 |
+
```
|
| 162 |
+
|
| 163 |
+
### 3. Test the Model
|
| 164 |
+
|
| 165 |
+
```bash
|
| 166 |
+
python test_prediction_pipeline.py
|
| 167 |
+
```
|
| 168 |
+
|
| 169 |
+
## 🏗️ Architecture
|
| 170 |
+
|
| 171 |
+
The model uses a sophisticated multi-modal architecture:
|
| 172 |
+
|
| 173 |
+
1. **MFCC Branch**: Processes Mel-frequency cepstral coefficients using dense neural network layers
|
| 174 |
+
2. **Spectrogram Branch**: Processes mel-spectrogram features using convolutional layers
|
| 175 |
+
3. **Fusion Layer**: Combines both feature representations before final classification
|
| 176 |
+
4. **Output Layer**: Softmax layer for emotion classification across 8 emotional states
|
| 177 |
+
|
| 178 |
+
## 📁 Project Structure
|
| 179 |
+
|
| 180 |
+
```
|
| 181 |
+
speech_emotion_classification/
|
| 182 |
+
├── app.py # Streamlit web application
|
| 183 |
+
├── auto_train.py # Automated training script
|
| 184 |
+
├── debug_labels.py # Label debugging utilities
|
| 185 |
+
├── driver.py # Main execution script
|
| 186 |
+
├── push_to_hub.py # Hugging Face model upload script
|
| 187 |
+
├── split_model.py # Model splitting utilities
|
| 188 |
+
├── test_*.py # Test files
|
| 189 |
+
├── requirements.txt # Project dependencies
|
| 190 |
+
├── README.md # This file
|
| 191 |
+
└── ...
|
| 192 |
+
```
|
| 193 |
+
|
| 194 |
+
## 🧪 Evaluation
|
| 195 |
+
|
| 196 |
+
To evaluate the model on custom audio files:
|
| 197 |
+
|
| 198 |
+
```bash
|
| 199 |
+
python test_prediction_pipeline.py
|
| 200 |
+
```
|
| 201 |
+
|
| 202 |
+
This will run the model on the test dataset and provide detailed performance metrics.
|
| 203 |
+
|
| 204 |
+
## 🤗 Hugging Face Integration
|
| 205 |
+
|
| 206 |
+
The model can be easily shared and deployed using Hugging Face Hub:
|
| 207 |
+
|
| 208 |
+
```bash
|
| 209 |
+
python push_to_hub.py
|
| 210 |
+
```
|
| 211 |
+
|
| 212 |
+
## 🚧 Limitations
|
| 213 |
+
|
| 214 |
+
- Performance may vary with different accents and languages
|
| 215 |
+
- Audio quality (noise, clarity) can significantly affect accuracy
|
| 216 |
+
- Emotions expressed in speech can be culturally dependent
|
| 217 |
+
- Requires clear audio with minimal background noise for best results
|
| 218 |
+
- Shorter audio clips (5-10 seconds) typically work better than longer recordings
|
| 219 |
+
|
| 220 |
+
## 🛡️ Ethical Considerations
|
| 221 |
+
|
| 222 |
+
- This model should not be used to make critical decisions about individuals without their explicit consent
|
| 223 |
+
- Results should be interpreted with caution and not treated as definitive psychological assessments
|
| 224 |
+
- Consider privacy implications when processing audio of individuals
|
| 225 |
+
- Use responsibly and ethically, with appropriate consent when analyzing personal speech
|
| 226 |
+
- Be aware of potential bias in the training data and its impact on model predictions
|
| 227 |
+
|
| 228 |
+
## 🧪 Reproducibility
|
| 229 |
+
|
| 230 |
+
To ensure reproducible results:
|
| 231 |
+
|
| 232 |
+
1. Set random seeds:
|
| 233 |
+
```python
|
| 234 |
+
import numpy as np
|
| 235 |
+
import tensorflow as tf
|
| 236 |
+
import random
|
| 237 |
+
|
| 238 |
+
np.random.seed(42)
|
| 239 |
+
tf.random.set_seed(42)
|
| 240 |
+
random.seed(42)
|
| 241 |
+
```
|
| 242 |
+
|
| 243 |
+
2. Use the same training data and preprocessing pipeline
|
| 244 |
+
|
| 245 |
+
## 🤝 Contributing
|
| 246 |
+
|
| 247 |
+
Contributions are welcome! Here's how you can contribute:
|
| 248 |
+
|
| 249 |
+
1. Fork the repository
|
| 250 |
+
2. Create a feature branch (`git checkout -b feature/amazing-feature`)
|
| 251 |
+
3. Commit your changes (`git commit -m 'Add some amazing feature'`)
|
| 252 |
+
4. Push to the branch (`git push origin feature/amazing-feature`)
|
| 253 |
+
5. Open a Pull Request
|
| 254 |
+
|
| 255 |
+
Please make sure to update tests as appropriate and follow the existing code style.
|
| 256 |
+
|
| 257 |
+
### Development Setup
|
| 258 |
+
|
| 259 |
+
```bash
|
| 260 |
+
git clone https://github.com/your-username/speech_emotion_classification.git
|
| 261 |
+
cd speech_emotion_classification
|
| 262 |
+
pip install -r requirements.txt
|
| 263 |
+
pip install -r requirements-dev.txt # For development dependencies
|
| 264 |
+
```
|
| 265 |
+
|
| 266 |
+
## 📄 License
|
| 267 |
+
|
| 268 |
+
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
| 269 |
+
|
| 270 |
+
## 📚 Citation
|
| 271 |
+
|
| 272 |
+
If you use this model in your research, please cite:
|
| 273 |
+
|
| 274 |
+
```bibtex
|
| 275 |
+
@software{speech_emotion_classification,
|
| 276 |
+
author = {AI Research Team},
|
| 277 |
+
title = {Speech Emotion Classification Model},
|
| 278 |
+
year = {2025},
|
| 279 |
+
url = {https://github.com/your-username/speech_emotion_classification}
|
| 280 |
+
}
|
| 281 |
+
```
|
| 282 |
+
|
| 283 |
+
## 🆘 Support
|
| 284 |
+
|
| 285 |
+
If you have any questions or encounter issues:
|
| 286 |
+
|
| 287 |
+
1. Check the [Issues](https://github.com/your-username/speech_emotion_classification/issues) page
|
| 288 |
+
2. Open a new issue if your problem hasn't been addressed
|
| 289 |
+
3. For feature requests, please open an issue with the "enhancement" tag
|
| 290 |
+
|
| 291 |
+
## 🙏 Acknowledgments
|
| 292 |
+
|
| 293 |
+
- The RAVDESS dataset creators for providing the high-quality emotional speech data
|
| 294 |
+
- The TensorFlow team for providing an excellent deep learning framework
|
| 295 |
+
- The Librosa team for audio processing capabilities
|
| 296 |
+
- The Hugging Face team for model sharing capabilities
|
cnn_emotion_model_20251022_065208_architecture.json
ADDED
|
@@ -0,0 +1 @@
|
|
|
|
|
|
|
| 1 |
+
{"module": "keras.src.models.functional", "class_name": "Functional", "config": {"name": "functional", "trainable": true, "layers": [{"module": "keras.layers", "class_name": "InputLayer", "config": {"batch_shape": [null, 128, 165, 1], "dtype": "float32", "sparse": false, "ragged": false, "name": "spec_input"}, "registered_name": null, "name": "spec_input", "inbound_nodes": []}, {"module": "keras.layers", "class_name": "Conv2D", "config": {"name": "conv2d", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "filters": 32, "kernel_size": [3, 3], "strides": [1, 1], "padding": "same", "data_format": "channels_last", "dilation_rate": [1, 1], "groups": 1, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 128, 165, 1]}, "name": "conv2d", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128, 165, 1], "dtype": "float32", "keras_history": ["spec_input", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_2", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 128, 165, 32]}, "name": "batch_normalization_2", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128, 165, 32], "dtype": "float32", "keras_history": ["conv2d", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "MaxPooling2D", "config": {"name": "max_pooling2d", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "pool_size": [2, 2], "padding": "valid", "strides": [2, 2], "data_format": "channels_last"}, "registered_name": null, "name": "max_pooling2d", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128, 165, 32], "dtype": "float32", "keras_history": ["batch_normalization_2", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_2", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_2", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 64, 82, 32], "dtype": "float32", "keras_history": ["max_pooling2d", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "Conv2D", "config": {"name": "conv2d_1", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "filters": 64, "kernel_size": [3, 3], "strides": [1, 1], "padding": "same", "data_format": "channels_last", "dilation_rate": [1, 1], "groups": 1, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 64, 82, 32]}, "name": "conv2d_1", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 64, 82, 32], "dtype": "float32", "keras_history": ["dropout_2", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_3", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 64, 82, 64]}, "name": "batch_normalization_3", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 64, 82, 64], "dtype": "float32", "keras_history": ["conv2d_1", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "InputLayer", "config": {"batch_shape": [null, 13], "dtype": "float32", "sparse": false, "ragged": false, "name": "mfcc_input"}, "registered_name": null, "name": "mfcc_input", "inbound_nodes": []}, {"module": "keras.layers", "class_name": "MaxPooling2D", "config": {"name": "max_pooling2d_1", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "pool_size": [2, 2], "padding": "valid", "strides": [2, 2], "data_format": "channels_last"}, "registered_name": null, "name": "max_pooling2d_1", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 64, 82, 64], "dtype": "float32", "keras_history": ["batch_normalization_3", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "units": 256, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 13]}, "name": "dense", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 13], "dtype": "float32", "keras_history": ["mfcc_input", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_3", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_3", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 32, 41, 64], "dtype": "float32", "keras_history": ["max_pooling2d_1", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 256]}, "name": "batch_normalization", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["dense", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "Conv2D", "config": {"name": "conv2d_2", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "filters": 128, "kernel_size": [3, 3], "strides": [1, 1], "padding": "same", "data_format": "channels_last", "dilation_rate": [1, 1], "groups": 1, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "activity_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 32, 41, 64]}, "name": "conv2d_2", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 32, 41, 64], "dtype": "float32", "keras_history": ["dropout_3", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["batch_normalization", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_4", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 32, 41, 128]}, "name": "batch_normalization_4", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 32, 41, 128], "dtype": "float32", "keras_history": ["conv2d_2", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense_1", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "units": 128, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 256]}, "name": "dense_1", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["dropout", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "MaxPooling2D", "config": {"name": "max_pooling2d_2", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "pool_size": [2, 2], "padding": "valid", "strides": [2, 2], "data_format": "channels_last"}, "registered_name": null, "name": "max_pooling2d_2", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 32, 41, 128], "dtype": "float32", "keras_history": ["batch_normalization_4", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_1", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 128]}, "name": "batch_normalization_1", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["dense_1", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_4", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_4", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 16, 20, 128], "dtype": "float32", "keras_history": ["max_pooling2d_2", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_1", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_1", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["batch_normalization_1", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "Flatten", "config": {"name": "flatten", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "data_format": "channels_last"}, "registered_name": null, "build_config": {"input_shape": [null, 16, 20, 128]}, "name": "flatten", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 16, 20, 128], "dtype": "float32", "keras_history": ["dropout_4", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Concatenate", "config": {"name": "fusion_concat", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1}, "registered_name": null, "build_config": {"input_shape": [[null, 128], [null, 40960]]}, "name": "fusion_concat", "inbound_nodes": [{"args": [[{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["dropout_1", 0, 0]}}, {"class_name": "__keras_tensor__", "config": {"shape": [null, 40960], "dtype": "float32", "keras_history": ["flatten", 0, 0]}}]], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense_2", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "units": 256, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 41088]}, "name": "dense_2", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 41088], "dtype": "float32", "keras_history": ["fusion_concat", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_5", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 256]}, "name": "batch_normalization_5", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["dense_2", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_5", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_5", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["batch_normalization_5", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense_3", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "units": 128, "activation": "relu", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": {"module": "keras.regularizers", "class_name": "L2", "config": {"l2": 0.0001}, "registered_name": null}, "bias_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 256]}, "name": "dense_3", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 256], "dtype": "float32", "keras_history": ["dropout_5", 0, 0]}}], "kwargs": {}}]}, {"module": "keras.layers", "class_name": "BatchNormalization", "config": {"name": "batch_normalization_6", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "axis": -1, "momentum": 0.99, "epsilon": 0.001, "center": true, "scale": true, "beta_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "gamma_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "moving_mean_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "moving_variance_initializer": {"module": "keras.initializers", "class_name": "Ones", "config": {}, "registered_name": null}, "beta_regularizer": null, "gamma_regularizer": null, "beta_constraint": null, "gamma_constraint": null, "synchronized": false}, "registered_name": null, "build_config": {"input_shape": [null, 128]}, "name": "batch_normalization_6", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["dense_3", 0, 0]}}], "kwargs": {"mask": null}}]}, {"module": "keras.layers", "class_name": "Dropout", "config": {"name": "dropout_6", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "rate": 0.3, "seed": null, "noise_shape": null}, "registered_name": null, "name": "dropout_6", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["batch_normalization_6", 0, 0]}}], "kwargs": {"training": false}}]}, {"module": "keras.layers", "class_name": "Dense", "config": {"name": "dense_4", "trainable": true, "dtype": {"module": "keras", "class_name": "DTypePolicy", "config": {"name": "float32"}, "registered_name": null}, "units": 8, "activation": "softmax", "use_bias": true, "kernel_initializer": {"module": "keras.initializers", "class_name": "GlorotUniform", "config": {"seed": null}, "registered_name": null}, "bias_initializer": {"module": "keras.initializers", "class_name": "Zeros", "config": {}, "registered_name": null}, "kernel_regularizer": null, "bias_regularizer": null, "kernel_constraint": null, "bias_constraint": null}, "registered_name": null, "build_config": {"input_shape": [null, 128]}, "name": "dense_4", "inbound_nodes": [{"args": [{"class_name": "__keras_tensor__", "config": {"shape": [null, 128], "dtype": "float32", "keras_history": ["dropout_6", 0, 0]}}], "kwargs": {}}]}], "input_layers": [["mfcc_input", 0, 0], ["spec_input", 0, 0]], "output_layers": [["dense_4", 0, 0]]}, "registered_name": "Functional", "build_config": {"input_shape": null}, "compile_config": {"loss": "sparse_categorical_crossentropy", "loss_weights": null, "metrics": ["accuracy"], "weighted_metrics": null, "run_eagerly": false, "steps_per_execution": 1, "jit_compile": false}}
|
cnn_emotion_model_20251022_065208_feature_info.json
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"feature_type": "multimodal",
|
| 3 |
+
"config": {
|
| 4 |
+
"mfcc": {
|
| 5 |
+
"enabled": true,
|
| 6 |
+
"parameters": {
|
| 7 |
+
"n_mels": 128,
|
| 8 |
+
"fmax": 8000,
|
| 9 |
+
"power": 2.0,
|
| 10 |
+
"n_mfcc": 40,
|
| 11 |
+
"n_fft": 2048,
|
| 12 |
+
"hop_length": 512
|
| 13 |
+
}
|
| 14 |
+
},
|
| 15 |
+
"mel_spectrogram": {
|
| 16 |
+
"enabled": true,
|
| 17 |
+
"parameters": {
|
| 18 |
+
"n_mels": 128,
|
| 19 |
+
"fmax": 8000,
|
| 20 |
+
"power": 2.0,
|
| 21 |
+
"n_mfcc": 40,
|
| 22 |
+
"n_fft": 2048,
|
| 23 |
+
"hop_length": 512
|
| 24 |
+
}
|
| 25 |
+
}
|
| 26 |
+
},
|
| 27 |
+
"normalization_params": {
|
| 28 |
+
"mfcc_scaler": {
|
| 29 |
+
"mean": [
|
| 30 |
+
-573.8881925855364,
|
| 31 |
+
42.58236339167943,
|
| 32 |
+
-5.421968076008534,
|
| 33 |
+
8.838157428074664,
|
| 34 |
+
-4.626548335373786,
|
| 35 |
+
-4.561592391416308,
|
| 36 |
+
-10.38183564253776,
|
| 37 |
+
-8.013113031831121,
|
| 38 |
+
-3.6677634406732977,
|
| 39 |
+
-2.2170092025531516,
|
| 40 |
+
-4.698173174810411,
|
| 41 |
+
-0.521445854008937,
|
| 42 |
+
-2.5761164628238475
|
| 43 |
+
],
|
| 44 |
+
"scale": [
|
| 45 |
+
101.40211169597691,
|
| 46 |
+
15.915899940828131,
|
| 47 |
+
13.570870655407589,
|
| 48 |
+
8.599923731484084,
|
| 49 |
+
9.14738173651626,
|
| 50 |
+
6.565023895647934,
|
| 51 |
+
6.508280879081033,
|
| 52 |
+
4.970972619842886,
|
| 53 |
+
4.785011824491553,
|
| 54 |
+
4.724790727787786,
|
| 55 |
+
4.41035342173799,
|
| 56 |
+
4.21637411935603,
|
| 57 |
+
4.011386501573868
|
| 58 |
+
],
|
| 59 |
+
"var": [
|
| 60 |
+
10282.38825640338,
|
| 61 |
+
253.31587092645293,
|
| 62 |
+
184.16853034580282,
|
| 63 |
+
73.95868818734314,
|
| 64 |
+
83.67459263355123,
|
| 65 |
+
43.099538750428366,
|
| 66 |
+
42.35772000101178,
|
| 67 |
+
24.71056878722765,
|
| 68 |
+
22.89633816052398,
|
| 69 |
+
22.323647421389435,
|
| 70 |
+
19.451217304635996,
|
| 71 |
+
17.777810714375338,
|
| 72 |
+
16.091221665009034
|
| 73 |
+
]
|
| 74 |
+
},
|
| 75 |
+
"spec_scaler": {
|
| 76 |
+
"mean": [
|
| 77 |
+
-43.60601707329218
|
| 78 |
+
],
|
| 79 |
+
"scale": [
|
| 80 |
+
32.47546967488067
|
| 81 |
+
],
|
| 82 |
+
"var": [
|
| 83 |
+
1054.656130604094
|
| 84 |
+
]
|
| 85 |
+
}
|
| 86 |
+
}
|
| 87 |
+
}
|
cnn_emotion_model_20251022_065208_manifest.txt
ADDED
|
@@ -0,0 +1,5 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Original file: cnn_emotion_model_20251022_065208.keras
|
| 2 |
+
Split into 3 parts:
|
| 3 |
+
- cnn_emotion_model_20251022_065208_part1.keras
|
| 4 |
+
- cnn_emotion_model_20251022_065208_part2.keras
|
| 5 |
+
- cnn_emotion_model_20251022_065208_part3.keras
|
cnn_emotion_model_20251022_065208_part1.h5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:52ff89869d45bd9ecfc2ebd26cfc4beffe4b36c59903a7e798e7e5b98d80b5a8
|
| 3 |
+
size 52428800
|
cnn_emotion_model_20251022_065208_part1.keras
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:48154f8a37befc72dab66c817789f85440b566e26c9c775a695dab5134efc208
|
| 3 |
+
size 52428800
|
cnn_emotion_model_20251022_065208_part2.h5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:bb6078528f1c8569f38bf62bfd4c1c9a8d4be874d0f3e8e93dfad087fc606940
|
| 3 |
+
size 52428800
|
cnn_emotion_model_20251022_065208_part2.keras
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:1b1b2f0779608cca6875461829e60bf33db53d53ee7fdff40e92e544fc0e722f
|
| 3 |
+
size 52428800
|
cnn_emotion_model_20251022_065208_part3.h5
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f5b3b2255c659c0e59c123f75e2d30b2a8a05e3bcfc8fd7b133dfa4a4f876fe2
|
| 3 |
+
size 23493312
|
cnn_emotion_model_20251022_065208_part3.keras
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:d6bd5a75c2badfe7cd65bb565e82833e5f4e632eda8c0f12ed7bb03c2c063f0f
|
| 3 |
+
size 23485996
|
requirements.txt
ADDED
|
@@ -0,0 +1,52 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Core libraries
|
| 2 |
+
# TensorFlow version compatible with Python 3.13.9 deployment environment
|
| 3 |
+
tensorflow==2.17.1
|
| 4 |
+
tensorboard>=2.17.0
|
| 5 |
+
numpy>=1.24.0
|
| 6 |
+
pandas>=2.0.0
|
| 7 |
+
scikit-learn>=1.3.0
|
| 8 |
+
|
| 9 |
+
# Audio processing
|
| 10 |
+
librosa>=0.10.0
|
| 11 |
+
soundfile>=0.12.0
|
| 12 |
+
|
| 13 |
+
# Dataset access
|
| 14 |
+
datasets>=2.14.0
|
| 15 |
+
huggingface-hub>=0.17.0
|
| 16 |
+
|
| 17 |
+
# Genetic algorithm optimization
|
| 18 |
+
deap>=1.4.0
|
| 19 |
+
|
| 20 |
+
# Visualization
|
| 21 |
+
matplotlib>=3.7.0
|
| 22 |
+
seaborn>=0.12.0
|
| 23 |
+
plotly>=5.15.0
|
| 24 |
+
kaleido>=0.2.1 # Required for plotly static image export
|
| 25 |
+
|
| 26 |
+
# UI
|
| 27 |
+
streamlit>=1.28.0
|
| 28 |
+
streamlit-extras>=0.4.0
|
| 29 |
+
streamlit-option-menu>=0.3.6
|
| 30 |
+
audio-recorder-streamlit>=0.0.8
|
| 31 |
+
|
| 32 |
+
# Dimensionality reduction for visualization
|
| 33 |
+
umap-learn>=0.5.3
|
| 34 |
+
|
| 35 |
+
# Advanced visualization and reporting
|
| 36 |
+
pydot>=1.4.2 # For model architecture visualization
|
| 37 |
+
graphviz>=0.20 # For model architecture visualization
|
| 38 |
+
|
| 39 |
+
# Utilities
|
| 40 |
+
tqdm>=4.65.0
|
| 41 |
+
h5py>=3.9.0
|
| 42 |
+
|
| 43 |
+
# Utilities
|
| 44 |
+
portalocker>=2.7.0
|
| 45 |
+
|
| 46 |
+
# Additional dependencies for compatibility
|
| 47 |
+
protobuf>=3.20.3,<4.0.0 # Must be compatible with TensorFlow
|
| 48 |
+
packaging>=21.0
|
| 49 |
+
requests>=2.28.0
|
| 50 |
+
python-dateutil>=2.8.0
|
| 51 |
+
pytz>=2022.0
|
| 52 |
+
six>=1.16.0
|