Faham commited on
Commit
93e56c4
Β·
1 Parent(s): f2e0cb4

UPDATE: readme

Browse files
Files changed (2) hide show
  1. README.md +157 -188
  2. app.py +0 -3
README.md CHANGED
@@ -11,32 +11,78 @@ pinned: false
11
 
12
  # Multimodal Sentiment Analysis
13
 
14
- A comprehensive multi-page Streamlit application for testing three independent sentiment analysis models: text, audio, and vision-based sentiment analysis.
15
-
16
- ## πŸš€ Features
17
-
18
- - **Multi-Page Interface**: Clean navigation with dedicated pages for each model
19
- - **Text Sentiment Analysis**: βœ… **READY TO USE** - TextBlob NLP model integrated
20
- - **Audio Sentiment Analysis**: βœ… **READY TO USE** - Fine-tuned Wav2Vec2 model integrated
21
- - πŸ“ **File Upload**: Support for WAV, MP3, M4A, FLAC files
22
- - πŸŽ™οΈ **Audio Recording**: Direct microphone recording (max 5 seconds)
23
- - πŸ”„ **Smart Preprocessing**: Automatic 16kHz sampling, 5s max duration (CREMA-D + RAVDESS format)
24
- - **Vision Sentiment Analysis**: βœ… **READY TO USE** - Fine-tuned ResNet-50 model integrated
25
- - πŸ“ **File Upload**: Support for PNG, JPG, JPEG, BMP, TIFF files
26
- - πŸ“· **Camera Capture**: Take photos directly with your camera
27
- - πŸ”„ **Smart Preprocessing**: Automatic face detection, tight face crop (0% padding), grayscale conversion, 224x224 resize
28
- - **Fused Model**: Combine predictions from all three models
29
- - **Modern UI**: Beautiful, responsive interface with custom styling
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  - **File Support**: Multiple audio and image format support
 
31
 
32
- ## πŸ“‹ Requirements
33
 
34
  - Python 3.9 or higher
35
- - Streamlit 1.28.0 or higher
36
- - PyTorch 1.13.0 or higher
37
- - Additional dependencies listed in `requirements.txt`
38
 
39
- ## πŸ› οΈ Installation
40
 
41
  1. **Clone the repository**:
42
 
@@ -58,11 +104,21 @@ A comprehensive multi-page Streamlit application for testing three independent s
58
  ```
59
 
60
  3. **Install dependencies**:
 
61
  ```bash
62
  pip install -r requirements.txt
63
  ```
64
 
65
- ## πŸš€ Usage
 
 
 
 
 
 
 
 
 
66
 
67
  1. **Start the Streamlit application**:
68
 
@@ -74,209 +130,122 @@ A comprehensive multi-page Streamlit application for testing three independent s
74
 
75
  3. **Navigate between pages** using the sidebar:
76
  - 🏠 **Home**: Overview and welcome page
77
- - πŸ“ **Text Sentiment**: βœ… **Ready to use** - Analyze text with TextBlob
78
- - 🎡 **Audio Sentiment**: βœ… **Ready to use** - Analyze audio with Wav2Vec2 - πŸ“ Upload audio files or πŸŽ™οΈ record directly with microphone using `st.audio_input`
79
- - πŸ–ΌοΈ **Vision Sentiment**: βœ… **Ready to use** - Analyze images with ResNet-50
80
- - πŸ“ Upload image files or πŸ“· take photos with camera
81
  - πŸ”— **Fused Model**: Combine all three models
82
 
83
- ## πŸ§ͺ Testing the Models
84
 
85
- Before running the full app, you can test if the models load correctly:
86
 
87
- ### Vision Model Test
88
 
89
- ```bash
90
- python test_vision_model.py
91
- ```
92
 
93
- ### Audio Model Test
94
 
95
- ```bash
96
- python test_audio_model.py
97
- ```
98
 
99
- These will verify that:
100
 
101
- - The model files exist
102
- - PyTorch can load the architectures
103
- - The trained weights can be loaded
104
- - Inference runs without errors
105
 
106
- ### πŸ” Troubleshooting Model Issues
 
 
107
 
108
- If you encounter tensor size mismatch errors, run the diagnostic scripts:
109
 
110
- ```bash
111
- python check_model.py # For vision model
112
- python test_audio_model.py # For audio model
113
- ```
114
 
115
- These will examine your model files and identify:
116
 
117
- - The actual number of output classes
118
- - Whether the architectures match expected models
119
- - Any compatibility issues
 
120
 
121
- **Common Issues:**
122
 
123
- - **Tensor size mismatch**: Models might have been trained with different numbers of classes
124
- - **Architecture mismatch**: Models might not match expected architectures
125
- - **Weight loading errors**: Corrupted or incompatible model files
126
- - **Library dependencies**: Missing transformers, librosa, or other required libraries
127
 
128
- ## πŸ“ Project Structure
 
 
129
 
 
 
130
  ```
131
- sentiment-fused/
132
- β”œβ”€β”€ app.py # Main Streamlit application
133
- β”œβ”€β”€ requirements.txt # Python dependencies
134
- β”œβ”€β”€ README.md # This file
135
- β”œβ”€β”€ test_vision_model.py # Vision model test script
136
- β”œβ”€β”€ test_audio_model.py # Audio model test script
137
- β”œβ”€β”€ main.py # Original main file
138
- β”œβ”€β”€ pyproject.toml # Project configuration
139
- └── models/ # Model files and notebooks
140
- β”œβ”€β”€ audio_sentiment_analysis.ipynb
141
- β”œβ”€β”€ vision_sentiment_analysis.ipynb
142
- β”œβ”€β”€ wav2vec2_model.pth # βœ… Fine-tuned Wav2Vec2 model (READY)
143
- └── resnet50_model.pth # βœ… Fine-tuned ResNet-50 model (READY)
144
- ```
145
-
146
- ## πŸ”§ Model Integration Status
147
-
148
- ### βœ… Text Sentiment Model - **READY TO USE**
149
-
150
- - **Model**: TextBlob (Natural Language Processing)
151
- - **Features**: Sentiment classification (Positive/Negative/Neutral) with confidence scores
152
- - **Input**: Any text input
153
- - **Analysis**: Real-time NLP sentiment analysis
154
- - **Status**: Fully integrated and tested
155
-
156
- ### βœ… Vision Sentiment Model - **READY TO USE**
157
-
158
- - **Model**: ResNet-50 fine-tuned on FER2013 dataset
159
- - **Training Dataset**:
160
- - πŸ–ΌοΈ **FER2013**: Facial Expression Recognition 2013 dataset
161
- - 🎯 **Classes**: 7 emotions mapped to 3 sentiments (Negative, Neutral, Positive)
162
- - πŸ—οΈ **Architecture**: ResNet-50 with ImageNet weights, fine-tuned for sentiment
163
- - **Classes**: 3 sentiment classes (Negative, Neutral, Positive)
164
- - **Input**: Images (PNG, JPG, JPEG, BMP, TIFF)
165
- - **Preprocessing**:
166
- - πŸ” **Face Detection**: Automatic face detection using OpenCV
167
- - 🎨 **Grayscale Conversion**: Convert to grayscale and replicate to 3 channels
168
- - πŸ“ **Face Cropping**: Crop to face region with 0% padding (tightest crop)
169
- - πŸ“ **Resize**: Scale to 224x224 pixels (FER2013 format)
170
- - 🎯 **Transforms**: Resize(224) β†’ CenterCrop(224) β†’ ToTensor β†’ ImageNet Normalization
171
- - πŸ“Š **Format**: 224x224 RGB with ImageNet mean/std normalization
172
- - **Status**: Fully integrated and tested
173
-
174
- ### βœ… Audio Sentiment Model - **READY TO USE**
175
-
176
- - **Model**: Wav2Vec2-base fine-tuned on RAVDESS + CREMA-D datasets
177
- - **Training Datasets**:
178
- - 🎡 **RAVDESS**: Ryerson Audio-Visual Database of Emotional Speech and Song
179
- - 🎡 **CREMA-D**: Crowd-sourced Emotional Multimodal Actors Dataset
180
- - **Classes**: 3 sentiment classes (Negative, Neutral, Positive)
181
- - **Input**:
182
- - πŸ“ **File Upload**: Audio files (WAV, MP3, M4A, FLAC)
183
- - πŸŽ™οΈ **Direct Recording**: Microphone input using `st.audio_input`
184
- - **Preprocessing**:
185
- - πŸ”„ **Sampling Rate**: 16kHz (matching CREMA-D + RAVDESS training)
186
- - ⏱️ **Duration**: Max 5 seconds (matching training max_duration_s=5.0)
187
- - 🎡 **Feature Extraction**: AutoFeatureExtractor with truncation and padding
188
- - πŸ“Š **Format**: Automatic resampling, max_length=int(5.0 \* 16000)
189
- - **Status**: Fully integrated and tested
190
-
191
- ### πŸ”— Fused Model - **FULLY READY**
192
-
193
- The fused model now uses all three integrated models: text (TextBlob), audio (Wav2Vec2), and vision (ResNet-50).
194
-
195
- ## πŸ“Š Supported File Formats
196
-
197
- ### Audio Files
198
-
199
- - WAV (.wav)
200
- - MP3 (.mp3)
201
- - M4A (.m4a)
202
- - FLAC (.flac)
203
 
204
- ### Image Files
205
 
206
- - PNG (.png)
207
- - JPEG (.jpg, .jpeg)
208
- - BMP (.bmp)
209
- - TIFF (.tiff)
210
 
211
- ## 🎨 Customization
212
-
213
- The application includes custom CSS styling that can be modified in the `app.py` file. Key styling classes:
214
 
215
- - `.main-header`: Main page headers
216
- - `.model-card`: Information cards
217
- - `.result-box`: Result display boxes
218
- - `.upload-section`: File upload areas
219
 
220
- ## πŸ” Troubleshooting
221
 
222
  ### Common Issues
223
 
224
- 1. **Port already in use**: Change the port with `streamlit run app.py --server.port 8502`
225
-
226
- 2. **Vision model loading errors**:
227
 
228
- - Ensure `models/resnet50_model.pth` exists
229
- - Run `python test_vision_model.py` to diagnose issues
230
- - Check PyTorch installation: `python -c "import torch; print(torch.__version__)"`
231
 
232
- 3. **Memory issues**: Large audio/image files may require more memory. Consider file size limits
233
-
234
- 4. **OpenCV issues**: If face detection fails, ensure `opencv-python` is installed:
235
-
236
- ```bash
237
- pip install opencv-python
238
- ```
239
 
240
- 5. **Dependency conflicts**: Use a virtual environment to avoid package conflicts
 
 
241
 
242
- ### Performance Tips
 
 
 
243
 
244
- - Use appropriate file sizes for audio and images
245
- - Consider implementing caching for model predictions
246
- - Use GPU acceleration if available for PyTorch models
247
- - The vision model automatically uses GPU if available
248
 
249
- ## 🀝 Contributing
250
-
251
- 1. Fork the repository
252
- 2. Create a feature branch
253
- 3. Make your changes
254
- 4. Test thoroughly
255
- 5. Submit a pull request
256
-
257
- ## πŸ“ License
258
-
259
- This project is licensed under the MIT License - see the LICENSE file for details.
260
-
261
- ## πŸ™ Acknowledgments
262
 
263
- - Streamlit team for the amazing web framework
264
- - PyTorch community for deep learning tools
265
- - Hugging Face for transformer models
266
- - All contributors to the open-source libraries used
267
 
268
- ## πŸ“ž Support
269
 
270
- For questions or issues:
271
 
272
- 1. Check the troubleshooting section above
273
- 2. Run `python test_vision_model.py` for vision model issues
274
- 3. Review the model integration examples
275
- 4. Open an issue on the repository
276
- 5. Contact the development team
 
 
277
 
278
- ---
279
 
280
- **Happy Sentiment Analysis! 🧠✨**
 
 
 
 
 
281
 
282
- **Note**: All **THREE MODELS** are now fully integrated and ready to use! πŸŽ‰
 
11
 
12
  # Multimodal Sentiment Analysis
13
 
14
+ A comprehensive Streamlit application that combines three different sentiment analysis models: text, audio, and vision-based sentiment analysis. The project demonstrates how to integrate multiple AI models for comprehensive sentiment understanding across different modalities.
15
+
16
+ ## What is it?
17
+
18
+ This project implements a **fused sentiment analysis system** that combines predictions from three independent models:
19
+
20
+ ### 1. Text Sentiment Analysis
21
+
22
+ - **Model**: TextBlob NLP library
23
+ - **Capability**: Analyzes text input for positive, negative, or neutral sentiment
24
+ - **Status**: βœ… Fully integrated and ready to use
25
+
26
+ ### 2. Audio Sentiment Analysis
27
+
28
+ - **Model**: Fine-tuned Wav2Vec2-base model
29
+ - **Training Data**: RAVDESS + CREMA-D emotional speech datasets
30
+ - **Capability**: Analyzes audio files and microphone recordings for sentiment
31
+ - **Features**:
32
+ - File upload support (WAV, MP3, M4A, FLAC)
33
+ - Direct microphone recording (max 5 seconds)
34
+ - Automatic preprocessing (16kHz sampling, 5s max duration)
35
+ - **Status**: βœ… Fully integrated and ready to use
36
+
37
+ ### 3. Vision Sentiment Analysis
38
+
39
+ - **Model**: Fine-tuned ResNet-50 model
40
+ - **Training Data**: FER2013 facial expression dataset
41
+ - **Capability**: Analyzes images for facial expression-based sentiment
42
+ - **Features**:
43
+ - File upload support (PNG, JPG, JPEG, BMP, TIFF)
44
+ - Camera capture functionality
45
+ - Automatic face detection and preprocessing
46
+ - Grayscale conversion and 224x224 resize
47
+ - **Status**: βœ… Fully integrated and ready to use
48
+
49
+ ### 4. Fused Model
50
+
51
+ - **Approach**: Combines predictions from all three models
52
+ - **Capability**: Provides comprehensive sentiment analysis across modalities
53
+ - **Status**: βœ… Fully integrated and ready to use
54
+
55
+ ## Project Structure
56
+
57
+ ```
58
+ sentiment-fused/
59
+ β”œβ”€β”€ app.py # Main Streamlit application
60
+ β”œβ”€β”€ simple_model_manager.py # Model management and Google Drive integration
61
+ β”œβ”€β”€ requirements.txt # Python dependencies
62
+ β”œβ”€β”€ pyproject.toml # Project configuration
63
+ β”œβ”€β”€ Dockerfile # Container deployment
64
+ β”œβ”€β”€ notebooks/ # Development notebooks
65
+ β”‚ β”œβ”€β”€ audio_sentiment_analysis.ipynb # Audio model development
66
+ β”‚ └── vision_sentiment_analysis.ipynb # Vision model development
67
+ └── models/ # Model storage directory
68
+ ```
69
+
70
+ ## Key Features
71
+
72
+ - **Real-time Analysis**: Instant sentiment predictions with confidence scores
73
+ - **Smart Preprocessing**: Automatic file format handling and preprocessing
74
+ - **Multi-Page Interface**: Clean navigation between different sentiment analysis modes
75
+ - **Model Management**: Automatic model downloading from Google Drive
76
  - **File Support**: Multiple audio and image format support
77
+ - **Camera & Microphone**: Direct input capture capabilities
78
 
79
+ ## Prerequisites
80
 
81
  - Python 3.9 or higher
82
+ - 4GB+ RAM (for model loading)
83
+ - Internet connection (for initial model download)
 
84
 
85
+ ## Installation
86
 
87
  1. **Clone the repository**:
88
 
 
104
  ```
105
 
106
  3. **Install dependencies**:
107
+
108
  ```bash
109
  pip install -r requirements.txt
110
  ```
111
 
112
+ 4. **Set up environment variables**:
113
+ Create a `.env` file in the project root with:
114
+ ```env
115
+ VISION_MODEL_DRIVE_ID=your_google_drive_vision_model_file_id_here
116
+ AUDIO_MODEL_DRIVE_ID=your_google_drive_audio_model_file_id_here
117
+ VISION_MODEL_FILENAME=resnet50_model.pth
118
+ AUDIO_MODEL_FILENAME=wav2vec2_model.pth
119
+ ```
120
+
121
+ ## Running Locally
122
 
123
  1. **Start the Streamlit application**:
124
 
 
130
 
131
  3. **Navigate between pages** using the sidebar:
132
  - 🏠 **Home**: Overview and welcome page
133
+ - πŸ“ **Text Sentiment**: Analyze text with TextBlob
134
+ - 🎡 **Audio Sentiment**: Analyze audio files or record with microphone
135
+ - πŸ–ΌοΈ **Vision Sentiment**: Analyze images or capture with camera
 
136
  - πŸ”— **Fused Model**: Combine all three models
137
 
138
+ ## Model Development
139
 
140
+ The project includes Jupyter notebooks that document the development process:
141
 
142
+ ### Audio Model (`notebooks/audio_sentiment_analysis.ipynb`)
143
 
144
+ - Wav2Vec2-base fine-tuning on RAVDESS + CREMA-D datasets
145
+ - Emotion-to-sentiment mapping (happy/surprised β†’ positive, sad/angry/fearful/disgust β†’ negative, neutral/calm β†’ neutral)
146
+ - Audio preprocessing pipeline (16kHz sampling, 5s max duration)
147
 
148
+ ### Vision Model (`notebooks/vision_sentiment_analysis.ipynb`)
149
 
150
+ - ResNet-50 fine-tuning on FER2013 dataset
151
+ - Emotion-to-sentiment mapping (happy/surprise β†’ positive, angry/disgust/fear/sad β†’ negative, neutral β†’ neutral)
152
+ - Image preprocessing pipeline (face detection, grayscale conversion, 224x224 resize)
153
 
154
+ ## Technical Implementation
155
 
156
+ ### Model Management
 
 
 
157
 
158
+ - `SimpleModelManager` class handles model downloading from Google Drive
159
+ - Automatic model caching and version management
160
+ - Environment variable configuration for model URLs
161
 
162
+ ### Preprocessing Pipelines
163
 
164
+ - **Audio**: Automatic resampling, duration limiting, feature extraction
165
+ - **Vision**: Face detection, cropping, grayscale conversion, normalization
166
+ - **Text**: Direct TextBlob processing
 
167
 
168
+ ### Streamlit Integration
169
 
170
+ - Multi-page application with sidebar navigation
171
+ - File upload widgets with format validation
172
+ - Real-time camera and microphone input
173
+ - Custom CSS styling for modern UI
174
 
175
+ ## Deployment
176
 
177
+ ### Docker Deployment
 
 
 
178
 
179
+ ```bash
180
+ # Build the container
181
+ docker build -t sentiment-fused .
182
 
183
+ # Run the container
184
+ docker run -p 7860:7860 sentiment-fused
185
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
186
 
187
+ The application will be available at `http://localhost:7860`
188
 
189
+ ### Local Development
 
 
 
190
 
191
+ ```bash
192
+ # Run with custom port
193
+ streamlit run app.py --server.port 8502
194
 
195
+ # Run with custom address
196
+ streamlit run app.py --server.address 0.0.0.0
197
+ ```
 
198
 
199
+ ## Troubleshooting
200
 
201
  ### Common Issues
202
 
203
+ 1. **Model Loading Errors**:
 
 
204
 
205
+ - Ensure environment variables are set correctly
206
+ - Check internet connection for model downloads
207
+ - Verify sufficient RAM (4GB+ recommended)
208
 
209
+ 2. **Dependency Issues**:
 
 
 
 
 
 
210
 
211
+ - Use virtual environment to avoid conflicts
212
+ - Install PyTorch with CUDA support if using GPU
213
+ - Ensure OpenCV is properly installed for face detection
214
 
215
+ 3. **Performance Issues**:
216
+ - Large audio/image files may cause memory issues
217
+ - Consider file size limits for better performance
218
+ - GPU acceleration available for PyTorch models
219
 
220
+ ### Model Testing
 
 
 
221
 
222
+ ```bash
223
+ # Test vision model
224
+ python -c "from simple_model_manager import SimpleModelManager; m = SimpleModelManager(); print('Vision model:', m.load_vision_model()[0] is not None)"
 
 
 
 
 
 
 
 
 
 
225
 
226
+ # Test audio model
227
+ python -c "from simple_model_manager import SimpleModelManager; m = SimpleModelManager(); print('Audio model:', m.load_audio_model()[0] is not None)"
228
+ ```
 
229
 
230
+ ## Dependencies
231
 
232
+ Key libraries used:
233
 
234
+ - **Streamlit**: Web application framework
235
+ - **PyTorch**: Deep learning framework
236
+ - **Transformers**: Hugging Face model library
237
+ - **OpenCV**: Computer vision and face detection
238
+ - **Librosa**: Audio processing
239
+ - **TextBlob**: Natural language processing
240
+ - **Gdown**: Google Drive file downloader
241
 
242
+ ## What This Project Demonstrates
243
 
244
+ 1. **Multimodal AI Integration**: Combining text, audio, and vision models
245
+ 2. **Model Management**: Automated downloading and caching of pre-trained models
246
+ 3. **Real-time Processing**: Live audio recording and camera capture
247
+ 4. **Smart Preprocessing**: Automatic format conversion and optimization
248
+ 5. **Modern Web UI**: Professional Streamlit application with custom styling
249
+ 6. **Production Ready**: Docker containerization and deployment
250
 
251
+ This project serves as a comprehensive example of building production-ready multimodal AI applications with modern Python tools and frameworks.
app.py CHANGED
@@ -1,9 +1,6 @@
1
  import streamlit as st
2
  import pandas as pd
3
  from PIL import Image
4
- import io
5
- import numpy as np
6
- import tempfile
7
  import os
8
  import torch
9
  import torch.nn as nn
 
1
  import streamlit as st
2
  import pandas as pd
3
  from PIL import Image
 
 
 
4
  import os
5
  import torch
6
  import torch.nn as nn