temp12821 commited on
Commit
feaf7eb
Β·
1 Parent(s): 8b4cd24

working prototype of the audio processing module

Browse files
Files changed (10) hide show
  1. .env.example +13 -3
  2. README.md +81 -132
  3. audio_processor.py +234 -0
  4. config.py +1 -1
  5. flask_app.py +35 -38
  6. models_config.py +157 -0
  7. preload_model.py +45 -0
  8. pyproject.toml +5 -0
  9. requirements.txt +5 -0
  10. streamlit_app.py +52 -34
.env.example CHANGED
@@ -1,9 +1,19 @@
1
  # Audio Sentiment Analysis Configuration
2
 
3
  # Model Selection (choose one):
4
- # Option 1: superb/wav2vec2-base-superb-er (lightweight, 4 emotions)
5
- # Option 2: ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition (heavy, 7 emotions)
6
- MODEL_NAME=superb/wav2vec2-base-superb-er
 
 
 
 
 
 
 
 
 
 
7
 
8
  # Audio Processing Settings
9
  CHUNK_DURATION=3
 
1
  # Audio Sentiment Analysis Configuration
2
 
3
  # Model Selection (choose one):
4
+ # Lightweight models (4 emotions: Happy, Sad, Angry, Neutral):
5
+ # - superb/wav2vec2-base-superb-er (recommended, fast)
6
+ # - superb/wav2vec2-large-superb-er (better accuracy, slower)
7
+ # - superb/hubert-large-superb-er (better accuracy, slower)
8
+ #
9
+ # Advanced models (7-8 emotions):
10
+ # - ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition
11
+ # - harshit345/xlsr-wav2vec-speech-emotion-recognition
12
+ # - amiriparian/wav2vec2-base-ravdess
13
+ #
14
+ # See models_config.py for full list and details
15
+
16
+ MODEL_NAME=superb/wav2vec2-large-superb-er
17
 
18
  # Audio Processing Settings
19
  CHUNK_DURATION=3
README.md CHANGED
@@ -1,185 +1,134 @@
1
  ---
2
- title: Flask Streamlit Demo
3
- emoji: πŸš€
4
- colorFrom: blue
5
- colorTo: green
6
  sdk: docker
7
  pinned: false
8
  license: mit
9
- short_description: Flask + Streamlit integration demo
10
  app_port: 7860
11
  ---
12
 
13
- # Flask + Streamlit Demo
14
 
15
- This Hugging Face Space demonstrates integration between Flask backend and Streamlit frontend.
16
 
17
- ## Features:
18
- - Flask API with `/helloworld` endpoint
19
- - Streamlit app that calls the Flask API and displays the response
20
- - Runs on Hugging Face Spaces using Docker
21
-
22
- ## How it works:
23
- 1. Flask API runs in the background on port 5000
24
- 2. Streamlit UI runs on port 7860 (Hugging Face default)
25
- 3. Click the button in Streamlit to call the Flask endpoint
26
-
27
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
28
-
29
- # Flask + Streamlit Project Setup
30
-
31
- ## Files Created:
32
- - `flask_app.py` - Flask backend with /helloworld endpoint
33
- - `streamlit_app.py` - Streamlit frontend that calls the Flask API
34
-
35
- ## How to Run:
36
-
37
- ### Step 1: Install Dependencies
38
  ```bash
 
 
39
  pip install -r requirements.txt
40
  ```
41
 
42
- ### Step 2: Start Flask Server (Terminal 1)
43
  ```bash
44
- python flask_app.py
45
- ```
46
- The Flask server will start on http://localhost:5000
47
 
48
- ### Step 3: Start Streamlit App (Terminal 2)
49
- ```bash
50
- streamlit run streamlit_app.py
51
  ```
52
- The Streamlit app will open in your browser (usually http://localhost:8501)
53
-
54
- ### Step 4: Test the Integration
55
- 1. Click the "Call Flask API" button in the Streamlit interface
56
- 2. The app will call the Flask endpoint and display the response
57
 
58
- ## Endpoints:
59
- - Flask API: `GET http://localhost:5000/helloworld`
60
- - Returns: `{"message": "Hello World!", "status": "success"}`
61
-
62
- ## Note:
63
- Make sure to run Flask first before using the Streamlit app!
64
-
65
- # Docker Setup Instructions
66
-
67
- ## Option 1: Using Docker Compose (Recommended - Runs both apps together)
68
-
69
- ### Build and Run:
70
  ```bash
71
- docker-compose up --build
72
- ```
73
-
74
- This will start:
75
- - Flask API on http://localhost:5000
76
- - Streamlit App on http://localhost:8501
77
 
78
- ### Stop:
79
- ```bash
80
- docker-compose down
81
  ```
82
 
83
- ---
84
 
85
- ## Option 2: Using Individual Docker Commands
86
-
87
- ### Build the image:
88
  ```bash
89
- docker build -t flask-streamlit-app .
90
  ```
91
 
92
- ### Run Flask only:
93
  ```bash
94
- docker run -p 5000:5000 flask-streamlit-app python flask_app.py
95
  ```
96
 
97
- ### Run Streamlit only:
98
- ```bash
99
- docker run -p 8501:8501 flask-streamlit-app streamlit run streamlit_app.py --server.address 0.0.0.0
100
- ```
101
 
102
  ---
103
 
104
- ## Accessing the Apps:
105
 
106
- - **Flask API**: http://localhost:5000/helloworld
107
- - **Streamlit App**: http://localhost:8501
 
 
 
108
 
109
- ---
110
 
111
- ## Notes:
112
 
113
- - The `docker-compose.yml` sets up networking so Streamlit can communicate with Flask
114
- - Both services are in the same network (`app-network`)
115
- - Streamlit automatically uses the Flask service URL when running in Docker
116
- - For local development without Docker, use `python flask_app.py` and `streamlit run streamlit_app.py`
117
 
 
 
 
 
118
 
119
- # Deploying to Hugging Face Spaces
120
 
121
- ## Prerequisites:
122
- - Hugging Face account
123
- - Git installed
124
 
125
- ## Deployment Steps:
126
 
127
- ### 1. Create a new Space on Hugging Face
128
- - Go to https://huggingface.co/new-space
129
- - Choose a name for your Space
130
- - Select **Docker** as the SDK
131
- - Choose your preferred visibility (public/private)
132
 
133
- ### 2. Clone your Space repository
134
- ```bash
135
- git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
136
- cd YOUR_SPACE_NAME
137
- ```
138
 
139
- ### 3. Copy files to the Space repository
140
- Copy these files to your Space repository:
141
- - `Dockerfile`
142
- - `requirements.txt`
143
- - `flask_app.py`
144
- - `streamlit_app.py`
145
- - `start.sh`
146
- - `README.md`
147
- - `.dockerignore`
148
-
149
- ### 4. Push to Hugging Face
150
- ```bash
151
- git add .
152
- git commit -m "Initial commit: Flask + Streamlit app"
153
- git push
154
  ```
155
 
156
- ### 5. Wait for build
157
- - Hugging Face will automatically build your Docker container
158
- - This may take 5-10 minutes
159
- - Monitor the build logs in your Space settings
160
 
161
- ### 6. Access your app
162
- - Once built, your app will be available at:
163
- `https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space`
164
 
165
- ## Important Notes:
166
 
167
- βœ… **Port 7860** - Hugging Face Spaces uses port 7860 by default (already configured)
 
 
 
168
 
169
- βœ… **Non-root user** - Dockerfile creates user with UID 1000 (Hugging Face requirement)
 
 
170
 
171
- βœ… **Both apps run together** - Flask runs in background, Streamlit in foreground
 
 
 
172
 
173
- βœ… **README.md header** - Contains Hugging Face Space configuration:
174
- ```yaml
175
- ---
176
- sdk: docker
177
- app_port: 7860
178
  ---
179
- ```
180
 
181
- ## Troubleshooting:
182
 
183
- - Check build logs in Space settings if build fails
184
- - Make sure all files are pushed to the repository
185
- - Ensure `start.sh` has execute permissions (handled in Dockerfile)
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Audio Sentiment Analysis
3
+ emoji: 🎀
4
+ colorFrom: purple
5
+ colorTo: blue
6
  sdk: docker
7
  pinned: false
8
  license: mit
9
+ short_description: Analyze emotions from audio with timeline visualization
10
  app_port: 7860
11
  ---
12
 
13
+ # Audio Sentiment Analysis - Setup Guide
14
 
15
+ ## Quick Start
16
 
17
+ ### 1. Install Dependencies
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ```bash
19
+ uv sync
20
+ # or
21
  pip install -r requirements.txt
22
  ```
23
 
24
+ ### 2. Configure Environment
25
  ```bash
26
+ # Copy example config
27
+ cp .env.example .env
 
28
 
29
+ # Edit .env and set your preferred model
30
+ # Default: superb/wav2vec2-base-superb-er
 
31
  ```
 
 
 
 
 
32
 
33
+ ### 3. Preload Model (Recommended)
 
 
 
 
 
 
 
 
 
 
 
34
  ```bash
35
+ # Download model before starting the app
36
+ uv run python preload_model.py
 
 
 
 
37
 
38
+ # This downloads ~100MB-1.3GB depending on model
39
+ # Cached in ~/.cache/huggingface/
 
40
  ```
41
 
42
+ ### 4. Start the Application
43
 
44
+ **Terminal 1 - Flask API:**
 
 
45
  ```bash
46
+ uv run python flask_app.py
47
  ```
48
 
49
+ **Terminal 2 - Streamlit Dashboard:**
50
  ```bash
51
+ uv run streamlit run streamlit_app.py
52
  ```
53
 
54
+ ### 5. Access the App
55
+ - **Streamlit UI:** http://localhost:8501
56
+ - **Flask API:** http://localhost:5000
 
57
 
58
  ---
59
 
60
+ ## Available Models
61
 
62
+ | Model | Emotions | Size | Speed | Accuracy |
63
+ |-------|----------|------|-------|----------|
64
+ | `superb/wav2vec2-base-superb-er` | 4 | ~100MB | ⚑⚑⚑ | ⭐⭐ |
65
+ | `superb/hubert-large-superb-er` | 4 | ~300MB | ⚑⚑ | ⭐⭐⭐ |
66
+ | `ehcalabres/wav2vec2-lg-xlsr` | 7 | ~1.2GB | ⚑ | ⭐⭐⭐⭐ |
67
 
68
+ **To change model:** Edit `MODEL_NAME` in `.env` file
69
 
70
+ ---
71
 
72
+ ## Configuration Files
 
 
 
73
 
74
+ - **`.env`** - Your local configuration (not in git)
75
+ - **`.env.example`** - Template with all options
76
+ - **`config.py`** - Loads environment variables
77
+ - **`models_config.py`** - Model-specific settings
78
 
79
+ ---
80
 
81
+ ## Deployment
 
 
82
 
83
+ ### Hugging Face Spaces
84
 
85
+ 1. Push to HF Spaces git repository
86
+ 2. Set environment variables in Space settings
87
+ 3. Docker will build automatically
88
+ 4. Model downloads on first run (or add to Dockerfile)
 
89
 
90
+ ### Adding Model to Docker Image
 
 
 
 
91
 
92
+ Edit `Dockerfile` to preload model:
93
+ ```dockerfile
94
+ RUN python preload_model.py
 
 
 
 
 
 
 
 
 
 
 
 
95
  ```
96
 
97
+ This caches the model in the image so deployment is faster.
 
 
 
98
 
99
+ ---
 
 
100
 
101
+ ## Troubleshooting
102
 
103
+ ### Model Download Issues
104
+ - Check internet connection
105
+ - Verify model name in `.env`
106
+ - Check disk space (~2GB free recommended)
107
 
108
+ ### "Model not found" errors
109
+ - Run `python preload_model.py` first
110
+ - Check HuggingFace Hub is accessible
111
 
112
+ ### Slow processing
113
+ - Use smaller model (wav2vec2-base)
114
+ - Reduce `CHUNK_DURATION` in `.env`
115
+ - Consider GPU if available
116
 
 
 
 
 
 
117
  ---
 
118
 
119
+ ## File Structure
120
 
121
+ ```
122
+ .
123
+ β”œβ”€β”€ flask_app.py # Flask API backend
124
+ β”œβ”€β”€ streamlit_app.py # Streamlit dashboard
125
+ β”œβ”€β”€ audio_processor.py # Audio processing logic
126
+ β”œβ”€β”€ config.py # Configuration loader
127
+ β”œβ”€β”€ models_config.py # Model definitions
128
+ β”œβ”€β”€ preload_model.py # Model download script
129
+ β”œβ”€β”€ .env # Your settings (gitignored)
130
+ β”œβ”€β”€ .env.example # Settings template
131
+ β”œβ”€β”€ requirements.txt # Python dependencies
132
+ β”œβ”€β”€ input/ # Example audio files
133
+ └── uploads/ # Temporary uploads (gitignored)
134
+ ```
audio_processor.py ADDED
@@ -0,0 +1,234 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import librosa
2
+ import numpy as np
3
+ from transformers import pipeline
4
+ from config import config
5
+ from models_config import get_model_config
6
+ import os
7
+
8
+ class AudioEmotionProcessor:
9
+ """Process audio files and extract emotions using ML models"""
10
+
11
+ def __init__(self):
12
+ self.model = None
13
+ self.model_name = config.MODEL_NAME
14
+ self.chunk_duration = config.CHUNK_DURATION
15
+ self.sample_rate = config.SAMPLE_RATE
16
+
17
+ # Get model-specific configuration
18
+ self.model_config = get_model_config(self.model_name)
19
+ self.label_mapping = self.model_config.get("label_mapping", {})
20
+
21
+ def load_model(self):
22
+ """Load the emotion detection model"""
23
+ if self.model is None:
24
+ print(f"Loading model: {self.model_name}")
25
+ print(f"Model config: {self.model_config['description']}")
26
+
27
+ # Get task type from model config
28
+ task = self.model_config.get("task", "audio-classification")
29
+
30
+ try:
31
+ # Load model with configured task
32
+ self.model = pipeline(
33
+ task=task,
34
+ model=self.model_name
35
+ )
36
+ print("Model loaded successfully!")
37
+ except Exception as e:
38
+ print(f"Failed to load with task '{task}', trying auto-detection...")
39
+ try:
40
+ # Fallback: Try audio-classification
41
+ self.model = pipeline(
42
+ "audio-classification",
43
+ model=self.model_name
44
+ )
45
+ print("Model loaded successfully with audio-classification!")
46
+ except Exception as e2:
47
+ print(f"Error loading model: {e2}")
48
+ raise
49
+
50
+ return self.model
51
+
52
+ def load_audio(self, filepath):
53
+ """Load audio file and resample to target sample rate"""
54
+ audio, sr = librosa.load(filepath, sr=self.sample_rate)
55
+ return audio, sr
56
+
57
+ def get_audio_duration(self, audio, sr):
58
+ """Get duration of audio in seconds"""
59
+ return librosa.get_duration(y=audio, sr=sr)
60
+
61
+ def split_into_chunks(self, audio, sr):
62
+ """Split audio into fixed-duration chunks"""
63
+ chunk_samples = int(self.chunk_duration * sr)
64
+ chunks = []
65
+
66
+ for i in range(0, len(audio), chunk_samples):
67
+ chunk = audio[i:i + chunk_samples]
68
+
69
+ # Pad last chunk if it's shorter
70
+ if len(chunk) < chunk_samples:
71
+ chunk = np.pad(chunk, (0, chunk_samples - len(chunk)), mode='constant')
72
+
73
+ chunks.append(chunk)
74
+
75
+ return chunks
76
+
77
+ def predict_emotion(self, audio_chunk):
78
+ """Predict emotion for a single audio chunk"""
79
+ if self.model is None:
80
+ self.load_model()
81
+
82
+ # Get predictions
83
+ predictions = self.model(audio_chunk)
84
+
85
+ # Get top prediction
86
+ top_prediction = predictions[0]
87
+
88
+ # Debug: Print raw model output
89
+ print(f"DEBUG - Raw prediction: {top_prediction}")
90
+
91
+ # Map model output to our emotion labels
92
+ emotion_label = self.map_emotion_label(top_prediction['label'])
93
+ confidence = top_prediction['score']
94
+
95
+ return emotion_label, confidence
96
+
97
+ def map_emotion_label(self, model_label):
98
+ """Map model output labels to standardized emotion names"""
99
+ # Different models may have different label formats
100
+ label_lower = model_label.lower()
101
+
102
+ # Use model-specific label mapping first
103
+ if label_lower in self.label_mapping:
104
+ return self.label_mapping[label_lower]
105
+
106
+ # Fallback to common variations
107
+ emotion_map = {
108
+ 'hap': 'Happy',
109
+ 'happy': 'Happy',
110
+ 'happiness': 'Happy',
111
+ 'sad': 'Sad',
112
+ 'sadness': 'Sad',
113
+ 'ang': 'Angry',
114
+ 'angry': 'Angry',
115
+ 'anger': 'Angry',
116
+ 'neu': 'Neutral',
117
+ 'neutral': 'Neutral',
118
+ 'calm': 'Neutral',
119
+ 'fear': 'Fear',
120
+ 'fearful': 'Fear',
121
+ 'surprise': 'Surprise',
122
+ 'surprised': 'Surprise',
123
+ 'disgust': 'Disgust'
124
+ }
125
+
126
+ # Try to find a match
127
+ for key, value in emotion_map.items():
128
+ if key in label_lower:
129
+ return value
130
+
131
+ # Default: capitalize first letter
132
+ return model_label.capitalize()
133
+
134
+ def format_time(self, seconds):
135
+ """Format seconds to MM:SS format"""
136
+ mins = int(seconds // 60)
137
+ secs = int(seconds % 60)
138
+ return f"{mins:02d}:{secs:02d}"
139
+
140
+ def process_audio_file(self, filepath, progress_callback=None):
141
+ """
142
+ Process entire audio file and return emotion timeline
143
+
144
+ Args:
145
+ filepath: Path to audio file
146
+ progress_callback: Optional callback function(progress, message)
147
+
148
+ Returns:
149
+ dict: Results containing timeline and metadata
150
+ """
151
+ try:
152
+ # Load model
153
+ if progress_callback:
154
+ progress_callback(10, "Loading model...")
155
+ self.load_model()
156
+
157
+ # Load audio
158
+ if progress_callback:
159
+ progress_callback(20, "Loading audio file...")
160
+ audio, sr = self.load_audio(filepath)
161
+
162
+ # Get duration
163
+ duration = self.get_audio_duration(audio, sr)
164
+ duration_formatted = self.format_time(duration)
165
+
166
+ # Split into chunks
167
+ if progress_callback:
168
+ progress_callback(30, "Splitting audio into segments...")
169
+ chunks = self.split_into_chunks(audio, sr)
170
+
171
+ # Process each chunk
172
+ timeline = []
173
+ total_chunks = len(chunks)
174
+
175
+ for i, chunk in enumerate(chunks):
176
+ # Calculate progress (30% to 90%)
177
+ progress = 30 + int((i / total_chunks) * 60)
178
+ if progress_callback:
179
+ progress_callback(
180
+ progress,
181
+ f"Analyzing chunk {i+1}/{total_chunks}..."
182
+ )
183
+
184
+ # Predict emotion
185
+ emotion, confidence = self.predict_emotion(chunk)
186
+
187
+ # Calculate timestamp
188
+ time_seconds = i * self.chunk_duration
189
+ time_formatted = self.format_time(time_seconds)
190
+
191
+ timeline.append({
192
+ "time": time_formatted,
193
+ "emotion": emotion,
194
+ "confidence": float(confidence)
195
+ })
196
+
197
+ # Calculate statistics
198
+ if progress_callback:
199
+ progress_callback(95, "Calculating statistics...")
200
+
201
+ emotions_list = [item['emotion'] for item in timeline]
202
+ unique_emotions = len(set(emotions_list))
203
+
204
+ # Find dominant emotion
205
+ from collections import Counter
206
+ emotion_counts = Counter(emotions_list)
207
+ dominant_emotion = emotion_counts.most_common(1)[0][0]
208
+
209
+ # Build results
210
+ results = {
211
+ "duration": duration_formatted,
212
+ "total_chunks": total_chunks,
213
+ "emotions_detected": unique_emotions,
214
+ "dominant_emotion": dominant_emotion,
215
+ "timeline": timeline
216
+ }
217
+
218
+ if progress_callback:
219
+ progress_callback(100, "Analysis complete!")
220
+
221
+ return results
222
+
223
+ except Exception as e:
224
+ raise Exception(f"Audio processing failed: {str(e)}")
225
+
226
+ # Global processor instance
227
+ _processor = None
228
+
229
+ def get_processor():
230
+ """Get or create global processor instance"""
231
+ global _processor
232
+ if _processor is None:
233
+ _processor = AudioEmotionProcessor()
234
+ return _processor
config.py CHANGED
@@ -8,7 +8,7 @@ class Config:
8
  """Application configuration loaded from environment variables"""
9
 
10
  # Model Settings
11
- MODEL_NAME = os.getenv('MODEL_NAME', 'superb/wav2vec2-base-superb-er')
12
 
13
  # Audio Processing Settings
14
  CHUNK_DURATION = int(os.getenv('CHUNK_DURATION', 3)) # seconds
 
8
  """Application configuration loaded from environment variables"""
9
 
10
  # Model Settings
11
+ MODEL_NAME = os.getenv('MODEL_NAME', 'superb/wav2vec2-large-superb-er')
12
 
13
  # Audio Processing Settings
14
  CHUNK_DURATION = int(os.getenv('CHUNK_DURATION', 3)) # seconds
flask_app.py CHANGED
@@ -6,6 +6,7 @@ from datetime import datetime
6
  from config import config
7
  from concurrent.futures import ThreadPoolExecutor
8
  import threading
 
9
 
10
  app = Flask(__name__)
11
  CORS(app) # Enable CORS for Streamlit
@@ -17,6 +18,21 @@ executor = ThreadPoolExecutor(max_workers=4)
17
  jobs = {}
18
  jobs_lock = threading.Lock()
19
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  # Upload folder for temporary audio files
21
  UPLOAD_FOLDER = 'uploads'
22
  os.makedirs(UPLOAD_FOLDER, exist_ok=True)
@@ -132,55 +148,35 @@ def process_audio(job_id, filepath):
132
  Process audio file and extract emotions
133
  This runs in a background thread
134
  """
135
- import time # For simulating processing time
136
-
137
  try:
138
- # Update status to processing
139
- with jobs_lock:
140
- jobs[job_id]["status"] = "processing"
141
- jobs[job_id]["progress"] = 10
142
- jobs[job_id]["message"] = "Loading audio file..."
143
-
144
- # Simulate some processing time
145
- time.sleep(1)
146
 
147
- with jobs_lock:
148
- jobs[job_id]["progress"] = 30
149
- jobs[job_id]["message"] = "Analyzing audio segments..."
150
-
151
- # TODO: Actual audio processing logic will go here
152
- # For now, return mock data
153
- time.sleep(2)
154
 
 
155
  with jobs_lock:
156
- jobs[job_id]["progress"] = 70
157
- jobs[job_id]["message"] = "Extracting emotions..."
158
-
159
- time.sleep(1)
160
 
161
- # Mock results
162
- results = {
163
- "duration": "00:45",
164
- "total_chunks": 15,
165
- "emotions_detected": 4,
166
- "dominant_emotion": "Happy",
167
- "timeline": [
168
- {"time": "00:00", "emotion": "Neutral", "confidence": 0.85},
169
- {"time": "00:03", "emotion": "Happy", "confidence": 0.92},
170
- {"time": "00:06", "emotion": "Happy", "confidence": 0.88},
171
- {"time": "00:09", "emotion": "Sad", "confidence": 0.78},
172
- {"time": "00:12", "emotion": "Neutral", "confidence": 0.90}
173
- ]
174
- }
175
 
 
176
  with jobs_lock:
177
  jobs[job_id]["progress"] = 100
178
  jobs[job_id]["status"] = "completed"
179
  jobs[job_id]["message"] = "Analysis complete!"
180
  jobs[job_id]["results"] = results
181
 
182
- # Clean up uploaded file after processing (optional)
183
- # os.remove(filepath)
 
 
 
184
 
185
  except Exception as e:
186
  with jobs_lock:
@@ -193,5 +189,6 @@ if __name__ == '__main__':
193
  app.run(
194
  debug=config.FLASK_DEBUG,
195
  host=config.FLASK_HOST,
196
- port=config.FLASK_PORT
 
197
  )
 
6
  from config import config
7
  from concurrent.futures import ThreadPoolExecutor
8
  import threading
9
+ from audio_processor import get_processor
10
 
11
  app = Flask(__name__)
12
  CORS(app) # Enable CORS for Streamlit
 
18
  jobs = {}
19
  jobs_lock = threading.Lock()
20
 
21
+ # Preload model on startup
22
+ print("=" * 60)
23
+ print("INITIALIZING APPLICATION...")
24
+ print("=" * 60)
25
+ try:
26
+ print("Preloading emotion detection model...")
27
+ processor = get_processor()
28
+ processor.load_model()
29
+ print("βœ… Model preloaded successfully!")
30
+ print("=" * 60)
31
+ except Exception as e:
32
+ print(f"⚠️ Warning: Failed to preload model: {e}")
33
+ print("Model will be loaded on first request.")
34
+ print("=" * 60)
35
+
36
  # Upload folder for temporary audio files
37
  UPLOAD_FOLDER = 'uploads'
38
  os.makedirs(UPLOAD_FOLDER, exist_ok=True)
 
148
  Process audio file and extract emotions
149
  This runs in a background thread
150
  """
 
 
151
  try:
152
+ # Get audio processor
153
+ processor = get_processor()
 
 
 
 
 
 
154
 
155
+ # Progress callback function
156
+ def update_progress(progress, message):
157
+ with jobs_lock:
158
+ jobs[job_id]["progress"] = progress
159
+ jobs[job_id]["message"] = message
 
 
160
 
161
+ # Update status to processing
162
  with jobs_lock:
163
+ jobs[job_id]["status"] = "processing"
 
 
 
164
 
165
+ # Process audio file with real ML model
166
+ results = processor.process_audio_file(filepath, progress_callback=update_progress)
 
 
 
 
 
 
 
 
 
 
 
 
167
 
168
+ # Mark as completed
169
  with jobs_lock:
170
  jobs[job_id]["progress"] = 100
171
  jobs[job_id]["status"] = "completed"
172
  jobs[job_id]["message"] = "Analysis complete!"
173
  jobs[job_id]["results"] = results
174
 
175
+ # Clean up uploaded file after processing
176
+ try:
177
+ os.remove(filepath)
178
+ except:
179
+ pass
180
 
181
  except Exception as e:
182
  with jobs_lock:
 
189
  app.run(
190
  debug=config.FLASK_DEBUG,
191
  host=config.FLASK_HOST,
192
+ port=config.FLASK_PORT,
193
+ use_reloader=False # Disable auto-reload to prevent socket errors
194
  )
models_config.py ADDED
@@ -0,0 +1,157 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ Configuration for different emotion detection models
3
+ Add new models here with their specific settings
4
+ """
5
+
6
+ MODELS_CONFIG = {
7
+ # SuperB Wav2Vec2 - Lightweight, 4 emotions
8
+ "superb/wav2vec2-base-superb-er": {
9
+ "task": "audio-classification",
10
+ "emotions": ["Neutral", "Happy", "Sad", "Angry"],
11
+ "label_mapping": {
12
+ "neu": "Neutral",
13
+ "neutral": "Neutral",
14
+ "hap": "Happy",
15
+ "happy": "Happy",
16
+ "sad": "Sad",
17
+ "sadness": "Sad",
18
+ "ang": "Angry",
19
+ "angry": "Angry",
20
+ "anger": "Angry"
21
+ },
22
+ "sample_rate": 16000,
23
+ "description": "Lightweight model with 4 basic emotions"
24
+ },
25
+
26
+ # SuperB HuBERT - Better accuracy, 4 emotions
27
+ "superb/hubert-large-superb-er": {
28
+ "task": "audio-classification",
29
+ "emotions": ["Neutral", "Happy", "Sad", "Angry"],
30
+ "label_mapping": {
31
+ "neu": "Neutral",
32
+ "neutral": "Neutral",
33
+ "hap": "Happy",
34
+ "happy": "Happy",
35
+ "sad": "Sad",
36
+ "sadness": "Sad",
37
+ "ang": "Angry",
38
+ "angry": "Angry",
39
+ "anger": "Angry"
40
+ },
41
+ "sample_rate": 16000,
42
+ "description": "HuBERT-based model with better accuracy"
43
+ },
44
+
45
+ # Ehcalabres Wav2Vec2 XLSR - 7 emotions
46
+ "ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition": {
47
+ "task": "audio-classification",
48
+ "emotions": ["Neutral", "Happy", "Sad", "Angry", "Fear", "Disgust", "Surprise"],
49
+ "label_mapping": {
50
+ "neu": "Neutral",
51
+ "neutral": "Neutral",
52
+ "hap": "Happy",
53
+ "happy": "Happy",
54
+ "happiness": "Happy",
55
+ "sad": "Sad",
56
+ "sadness": "Sad",
57
+ "ang": "Angry",
58
+ "angry": "Angry",
59
+ "anger": "Angry",
60
+ "fea": "Fear",
61
+ "fear": "Fear",
62
+ "dis": "Disgust",
63
+ "disgust": "Disgust",
64
+ "sur": "Surprise",
65
+ "surprise": "Surprise"
66
+ },
67
+ "sample_rate": 16000,
68
+ "description": "Multi-lingual model with 7 emotions"
69
+ },
70
+
71
+ # Harshit345 XLSR - Alternative model
72
+ "harshit345/xlsr-wav2vec-speech-emotion-recognition": {
73
+ "task": "automatic-speech-recognition", # Different task type
74
+ "emotions": ["Neutral", "Happy", "Sad", "Angry", "Fear", "Disgust", "Surprise"],
75
+ "label_mapping": {
76
+ "neutral": "Neutral",
77
+ "calm": "Neutral",
78
+ "happy": "Happy",
79
+ "sad": "Sad",
80
+ "angry": "Angry",
81
+ "fearful": "Fear",
82
+ "fear": "Fear",
83
+ "disgust": "Disgust",
84
+ "surprised": "Surprise",
85
+ "surprise": "Surprise"
86
+ },
87
+ "sample_rate": 16000,
88
+ "description": "XLSR-based emotion recognition",
89
+ "special_handling": True # Needs custom loading
90
+ },
91
+
92
+ # Amiriparian Wav2Vec2 - RAVDESS dataset
93
+ "amiriparian/wav2vec2-base-ravdess": {
94
+ "task": "audio-classification",
95
+ "emotions": ["Neutral", "Happy", "Sad", "Angry", "Fear", "Disgust", "Surprise", "Calm"],
96
+ "label_mapping": {
97
+ "01": "Neutral",
98
+ "02": "Calm",
99
+ "03": "Happy",
100
+ "04": "Sad",
101
+ "05": "Angry",
102
+ "06": "Fear",
103
+ "07": "Disgust",
104
+ "08": "Surprise",
105
+ "neutral": "Neutral",
106
+ "calm": "Calm",
107
+ "happy": "Happy",
108
+ "sad": "Sad",
109
+ "angry": "Angry",
110
+ "fearful": "Fear",
111
+ "fear": "Fear",
112
+ "disgust": "Disgust",
113
+ "surprised": "Surprise",
114
+ "surprise": "Surprise"
115
+ },
116
+ "sample_rate": 16000,
117
+ "description": "Trained on RAVDESS dataset with 8 emotions"
118
+ }
119
+ }
120
+
121
+ def get_model_config(model_name):
122
+ """
123
+ Get configuration for a specific model
124
+
125
+ Args:
126
+ model_name: Name of the model
127
+
128
+ Returns:
129
+ dict: Model configuration or default config
130
+ """
131
+ if model_name in MODELS_CONFIG:
132
+ return MODELS_CONFIG[model_name]
133
+
134
+ # Default configuration for unknown models
135
+ return {
136
+ "task": "audio-classification",
137
+ "emotions": ["Neutral", "Happy", "Sad", "Angry"],
138
+ "label_mapping": {},
139
+ "sample_rate": 16000,
140
+ "description": "Custom model",
141
+ "special_handling": False
142
+ }
143
+
144
+ def get_available_models():
145
+ """Get list of all available configured models"""
146
+ return list(MODELS_CONFIG.keys())
147
+
148
+ def get_model_info(model_name):
149
+ """Get human-readable info about a model"""
150
+ config = get_model_config(model_name)
151
+ return {
152
+ "name": model_name,
153
+ "emotions": config["emotions"],
154
+ "num_emotions": len(config["emotions"]),
155
+ "description": config["description"],
156
+ "sample_rate": config["sample_rate"]
157
+ }
preload_model.py ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ #!/usr/bin/env python3
2
+ """
3
+ Standalone script to preload and cache the emotion detection model
4
+ Run this before starting the Flask app to download the model in advance
5
+ """
6
+
7
+ import os
8
+ from audio_processor import get_processor
9
+ from config import config
10
+
11
+ def preload_model():
12
+ """Download and cache the model"""
13
+ print("=" * 70)
14
+ print("MODEL PRELOAD SCRIPT")
15
+ print("=" * 70)
16
+ print(f"Model: {config.MODEL_NAME}")
17
+ print(f"Cache location: ~/.cache/huggingface/")
18
+ print("-" * 70)
19
+
20
+ try:
21
+ print("\nπŸ“₯ Downloading and loading model...")
22
+ processor = get_processor()
23
+ processor.load_model()
24
+
25
+ print("\nβœ… SUCCESS!")
26
+ print("=" * 70)
27
+ print("Model has been downloaded and cached.")
28
+ print("You can now start the Flask app without waiting for download.")
29
+ print("=" * 70)
30
+
31
+ except Exception as e:
32
+ print("\n❌ FAILED!")
33
+ print("=" * 70)
34
+ print(f"Error: {e}")
35
+ print("\nTroubleshooting:")
36
+ print("1. Check your internet connection")
37
+ print("2. Verify model name in .env file")
38
+ print("3. Ensure you have enough disk space")
39
+ print("=" * 70)
40
+ return False
41
+
42
+ return True
43
+
44
+ if __name__ == "__main__":
45
+ preload_model()
pyproject.toml CHANGED
@@ -7,9 +7,14 @@ requires-python = ">=3.10"
7
  dependencies = [
8
  "flask>=3.1.2",
9
  "flask-cors>=6.0.2",
 
10
  "pandas>=2.3.3",
11
  "plotly>=6.5.2",
12
  "python-dotenv>=1.2.1",
13
  "requests>=2.32.5",
 
14
  "streamlit>=1.54.0",
 
 
 
15
  ]
 
7
  dependencies = [
8
  "flask>=3.1.2",
9
  "flask-cors>=6.0.2",
10
+ "librosa>=0.11.0",
11
  "pandas>=2.3.3",
12
  "plotly>=6.5.2",
13
  "python-dotenv>=1.2.1",
14
  "requests>=2.32.5",
15
+ "soundfile>=0.13.1",
16
  "streamlit>=1.54.0",
17
+ "torch>=2.10.0",
18
+ "torchaudio>=2.10.0",
19
+ "transformers>=5.1.0",
20
  ]
requirements.txt CHANGED
@@ -7,3 +7,8 @@ requests
7
  pandas
8
  plotly
9
  python-dotenv
 
 
 
 
 
 
7
  pandas
8
  plotly
9
  python-dotenv
10
+ librosa
11
+ soundfile
12
+ transformers
13
+ torch
14
+ torchaudio
streamlit_app.py CHANGED
@@ -63,7 +63,7 @@ with tab1:
63
  st.warning("⚠️ Example file not found in input/ folder")
64
 
65
  # Show analyze button
66
- analyze_btn = st.button("πŸ” Analyze Audio", type="primary", use_container_width=True, disabled=(audio_file is None))
67
 
68
  # Initialize session state for results
69
  if 'analysis_results' not in st.session_state:
@@ -155,7 +155,7 @@ with tab1:
155
  break
156
 
157
  # Wait before next poll
158
- time.sleep(2)
159
  attempt += 1
160
 
161
  if attempt >= max_attempts:
@@ -179,12 +179,16 @@ with tab1:
179
  # Get results from session state
180
  results = st.session_state.analysis_results
181
 
182
- # Emotion emoji mapping
183
  emotion_emoji_map = {
184
  'Happy': '😊',
185
  'Sad': '😒',
186
  'Angry': '😑',
187
- 'Neutral': '😐'
 
 
 
 
188
  }
189
 
190
  # Convert timeline to DataFrame
@@ -220,39 +224,45 @@ with tab1:
220
  with col1:
221
  st.subheader("⏱️ Emotion Timeline")
222
 
223
- # Bar chart with emojis
224
- fig_timeline = go.Figure()
225
-
226
  colors = {
227
  'Happy': '#FFD700',
228
  'Sad': '#4169E1',
229
  'Angry': '#DC143C',
230
- 'Neutral': '#808080'
 
 
 
 
231
  }
232
 
233
- for emotion in sample_timeline['Emotion'].unique():
234
- emotion_data = sample_timeline[sample_timeline['Emotion'] == emotion]
235
- fig_timeline.add_trace(go.Bar(
236
- x=emotion_data['Time (s)'],
237
- y=emotion_data['Confidence'],
238
- name=f"{emotion_emoji_map[emotion]} {emotion}",
239
- marker_color=colors[emotion],
240
- text=[emotion_emoji_map[emotion]] * len(emotion_data),
241
- textposition='outside',
242
- textfont=dict(size=20)
243
- ))
 
 
 
 
 
 
244
 
245
  fig_timeline.update_layout(
246
  xaxis_title="Time",
247
  yaxis_title="Confidence",
248
  yaxis_range=[0, 1.1],
249
- barmode='group',
250
  height=400,
251
- showlegend=True,
252
- hovermode='x unified'
253
  )
254
 
255
- st.plotly_chart(fig_timeline, use_container_width=True)
256
 
257
  with col2:
258
  st.subheader("πŸ“Š Distribution")
@@ -274,7 +284,7 @@ with tab1:
274
  showlegend=False
275
  )
276
 
277
- st.plotly_chart(fig_pie, use_container_width=True)
278
 
279
  # Detailed Timeline Table
280
  st.subheader("πŸ“‹ Detailed Timeline")
@@ -282,7 +292,7 @@ with tab1:
282
  display_df['Confidence'] = display_df['Confidence'].apply(lambda x: f"{x:.2%}")
283
  st.dataframe(
284
  display_df,
285
- use_container_width=True,
286
  hide_index=True
287
  )
288
 
@@ -297,11 +307,11 @@ with tab2:
297
  col1, col2, col3 = st.columns(3)
298
 
299
  with col1:
300
- record_btn = st.button("πŸ”΄ Start Recording", type="primary", use_container_width=True)
301
  with col2:
302
- stop_btn = st.button("⏹️ Stop Recording", use_container_width=True)
303
  with col3:
304
- analyze_record_btn = st.button("πŸ” Analyze Recording", use_container_width=True)
305
 
306
  # Recording status
307
  if record_btn:
@@ -346,12 +356,16 @@ with tab2:
346
  st.markdown("---")
347
  st.subheader("πŸ“Š Emotion Analysis Results")
348
 
349
- # Emotion emoji mapping
350
  emotion_emoji_map = {
351
  'Happy': '😊',
352
  'Sad': '😒',
353
  'Angry': '😑',
354
- 'Neutral': '😐'
 
 
 
 
355
  }
356
 
357
  # Sample data for recorded audio
@@ -394,7 +408,11 @@ with tab2:
394
  'Happy': '#FFD700',
395
  'Sad': '#4169E1',
396
  'Angry': '#DC143C',
397
- 'Neutral': '#808080'
 
 
 
 
398
  }
399
 
400
  for emotion in sample_data['Emotion'].unique():
@@ -419,7 +437,7 @@ with tab2:
419
  hovermode='x unified'
420
  )
421
 
422
- st.plotly_chart(fig_timeline, use_container_width=True)
423
 
424
  with col2:
425
  st.subheader("πŸ“Š Distribution")
@@ -441,7 +459,7 @@ with tab2:
441
  showlegend=False
442
  )
443
 
444
- st.plotly_chart(fig_pie, use_container_width=True)
445
 
446
  # Detailed Timeline Table
447
  st.subheader("πŸ“‹ Detailed Timeline")
@@ -449,7 +467,7 @@ with tab2:
449
  display_df['Confidence'] = display_df['Confidence'].apply(lambda x: f"{x:.2%}")
450
  st.dataframe(
451
  display_df,
452
- use_container_width=True,
453
  hide_index=True
454
  )
455
 
 
63
  st.warning("⚠️ Example file not found in input/ folder")
64
 
65
  # Show analyze button
66
+ analyze_btn = st.button("πŸ” Analyze Audio", type="primary", width="stretch", disabled=(audio_file is None))
67
 
68
  # Initialize session state for results
69
  if 'analysis_results' not in st.session_state:
 
155
  break
156
 
157
  # Wait before next poll
158
+ time.sleep(5)
159
  attempt += 1
160
 
161
  if attempt >= max_attempts:
 
179
  # Get results from session state
180
  results = st.session_state.analysis_results
181
 
182
+ # Emotion emoji mapping (supports all emotions)
183
  emotion_emoji_map = {
184
  'Happy': '😊',
185
  'Sad': '😒',
186
  'Angry': '😑',
187
+ 'Neutral': '😐',
188
+ 'Fear': '😨',
189
+ 'Surprise': '😲',
190
+ 'Disgust': '🀒',
191
+ 'Calm': '😌'
192
  }
193
 
194
  # Convert timeline to DataFrame
 
224
  with col1:
225
  st.subheader("⏱️ Emotion Timeline")
226
 
227
+ # Color mapping (supports all emotions)
 
 
228
  colors = {
229
  'Happy': '#FFD700',
230
  'Sad': '#4169E1',
231
  'Angry': '#DC143C',
232
+ 'Neutral': '#808080',
233
+ 'Fear': '#9370DB',
234
+ 'Surprise': '#FF8C00',
235
+ 'Disgust': '#32CD32',
236
+ 'Calm': '#87CEEB'
237
  }
238
 
239
+ # Create bar chart with individual bars (not grouped)
240
+ fig_timeline = go.Figure()
241
+
242
+ # Add all bars in sequence
243
+ bar_colors = [colors[emotion] for emotion in sample_timeline['Emotion']]
244
+ bar_text = [emotion_emoji_map[emotion] for emotion in sample_timeline['Emotion']]
245
+
246
+ fig_timeline.add_trace(go.Bar(
247
+ x=sample_timeline['Time (s)'],
248
+ y=sample_timeline['Confidence'],
249
+ marker_color=bar_colors,
250
+ text=bar_text,
251
+ textposition='outside',
252
+ textfont=dict(size=20),
253
+ hovertemplate='<b>%{x}</b><br>Confidence: %{y:.2%}<br><extra></extra>',
254
+ showlegend=False
255
+ ))
256
 
257
  fig_timeline.update_layout(
258
  xaxis_title="Time",
259
  yaxis_title="Confidence",
260
  yaxis_range=[0, 1.1],
 
261
  height=400,
262
+ hovermode='x'
 
263
  )
264
 
265
+ st.plotly_chart(fig_timeline, width="stretch")
266
 
267
  with col2:
268
  st.subheader("πŸ“Š Distribution")
 
284
  showlegend=False
285
  )
286
 
287
+ st.plotly_chart(fig_pie, width="stretch")
288
 
289
  # Detailed Timeline Table
290
  st.subheader("πŸ“‹ Detailed Timeline")
 
292
  display_df['Confidence'] = display_df['Confidence'].apply(lambda x: f"{x:.2%}")
293
  st.dataframe(
294
  display_df,
295
+ width="stretch",
296
  hide_index=True
297
  )
298
 
 
307
  col1, col2, col3 = st.columns(3)
308
 
309
  with col1:
310
+ record_btn = st.button("πŸ”΄ Start Recording", type="primary", width="stretch")
311
  with col2:
312
+ stop_btn = st.button("⏹️ Stop Recording", width="stretch")
313
  with col3:
314
+ analyze_record_btn = st.button("πŸ” Analyze Recording", width="stretch")
315
 
316
  # Recording status
317
  if record_btn:
 
356
  st.markdown("---")
357
  st.subheader("πŸ“Š Emotion Analysis Results")
358
 
359
+ # Emotion emoji mapping (supports all emotions)
360
  emotion_emoji_map = {
361
  'Happy': '😊',
362
  'Sad': '😒',
363
  'Angry': '😑',
364
+ 'Neutral': '😐',
365
+ 'Fear': '😨',
366
+ 'Surprise': '😲',
367
+ 'Disgust': '🀒',
368
+ 'Calm': '😌'
369
  }
370
 
371
  # Sample data for recorded audio
 
408
  'Happy': '#FFD700',
409
  'Sad': '#4169E1',
410
  'Angry': '#DC143C',
411
+ 'Neutral': '#808080',
412
+ 'Fear': '#9370DB',
413
+ 'Surprise': '#FF8C00',
414
+ 'Disgust': '#32CD32',
415
+ 'Calm': '#87CEEB'
416
  }
417
 
418
  for emotion in sample_data['Emotion'].unique():
 
437
  hovermode='x unified'
438
  )
439
 
440
+ st.plotly_chart(fig_timeline, width="stretch")
441
 
442
  with col2:
443
  st.subheader("πŸ“Š Distribution")
 
459
  showlegend=False
460
  )
461
 
462
+ st.plotly_chart(fig_pie, width="stretch")
463
 
464
  # Detailed Timeline Table
465
  st.subheader("πŸ“‹ Detailed Timeline")
 
467
  display_df['Confidence'] = display_df['Confidence'].apply(lambda x: f"{x:.2%}")
468
  st.dataframe(
469
  display_df,
470
+ width="stretch",
471
  hide_index=True
472
  )
473