Spaces:

temp12821
/

audioSentiment

Sleeping

App Files Files Community

temp12821 commited on Feb 9

Commit

feaf7eb

1 Parent(s): 8b4cd24

working prototype of the audio processing module

Browse files

Files changed (10) hide show

.env.example +13 -3
README.md +81 -132
audio_processor.py +234 -0
config.py +1 -1
flask_app.py +35 -38
models_config.py +157 -0
preload_model.py +45 -0
pyproject.toml +5 -0
requirements.txt +5 -0
streamlit_app.py +52 -34

.env.example CHANGED Viewed

@@ -1,9 +1,19 @@
 # Audio Sentiment Analysis Configuration
 # Model Selection (choose one):
-# Option 1: superb/wav2vec2-base-superb-er (lightweight, 4 emotions)
-# Option 2: ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition (heavy, 7 emotions)
-MODEL_NAME=superb/wav2vec2-base-superb-er
 # Audio Processing Settings
 CHUNK_DURATION=3

 # Audio Sentiment Analysis Configuration
 # Model Selection (choose one):
+# Lightweight models (4 emotions: Happy, Sad, Angry, Neutral):
+#   - superb/wav2vec2-base-superb-er (recommended, fast)
+#   - superb/wav2vec2-large-superb-er (better accuracy, slower)
+#   - superb/hubert-large-superb-er (better accuracy, slower)
+#
+# Advanced models (7-8 emotions):
+#   - ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition
+#   - harshit345/xlsr-wav2vec-speech-emotion-recognition
+#   - amiriparian/wav2vec2-base-ravdess
+#
+# See models_config.py for full list and details
+MODEL_NAME=superb/wav2vec2-large-superb-er
 # Audio Processing Settings
 CHUNK_DURATION=3

README.md CHANGED Viewed

@@ -1,185 +1,134 @@
 ---
-title: Flask Streamlit Demo
-emoji: 🚀
-colorFrom: blue
-colorTo: green
 sdk: docker
 pinned: false
 license: mit
-short_description: Flask + Streamlit integration demo
 app_port: 7860
 ---
-# Flask + Streamlit Demo
-This Hugging Face Space demonstrates integration between Flask backend and Streamlit frontend.
-## Features:
-- Flask API with `/helloworld` endpoint
-- Streamlit app that calls the Flask API and displays the response
-- Runs on Hugging Face Spaces using Docker
-## How it works:
-1. Flask API runs in the background on port 5000
-2. Streamlit UI runs on port 7860 (Hugging Face default)
-3. Click the button in Streamlit to call the Flask endpoint
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
-# Flask + Streamlit Project Setup
-## Files Created:
-- `flask_app.py` - Flask backend with /helloworld endpoint
-- `streamlit_app.py` - Streamlit frontend that calls the Flask API
-## How to Run:
-### Step 1: Install Dependencies
 ```bash
 pip install -r requirements.txt
 ```
-### Step 2: Start Flask Server (Terminal 1)
 ```bash
-python flask_app.py
-```
-The Flask server will start on http://localhost:5000
-### Step 3: Start Streamlit App (Terminal 2)
-```bash
-streamlit run streamlit_app.py
 ```
-The Streamlit app will open in your browser (usually http://localhost:8501)
-### Step 4: Test the Integration
-1. Click the "Call Flask API" button in the Streamlit interface
-2. The app will call the Flask endpoint and display the response
-## Endpoints:
-- Flask API: `GET http://localhost:5000/helloworld`
-  - Returns: `{"message": "Hello World!", "status": "success"}`
-## Note:
-Make sure to run Flask first before using the Streamlit app!
-# Docker Setup Instructions
-## Option 1: Using Docker Compose (Recommended - Runs both apps together)
-### Build and Run:
 ```bash
-docker-compose up --build
-```
-This will start:
-- Flask API on http://localhost:5000
-- Streamlit App on http://localhost:8501
-### Stop:
-```bash
-docker-compose down
 ```
----
-## Option 2: Using Individual Docker Commands
-### Build the image:
 ```bash
-docker build -t flask-streamlit-app .
 ```
-### Run Flask only:
 ```bash
-docker run -p 5000:5000 flask-streamlit-app python flask_app.py
 ```
-### Run Streamlit only:
-```bash
-docker run -p 8501:8501 flask-streamlit-app streamlit run streamlit_app.py --server.address 0.0.0.0
-```
 ---
-## Accessing the Apps:
-- **Flask API**: http://localhost:5000/helloworld
-- **Streamlit App**: http://localhost:8501
----
-## Notes:
-- The `docker-compose.yml` sets up networking so Streamlit can communicate with Flask
-- Both services are in the same network (`app-network`)
-- Streamlit automatically uses the Flask service URL when running in Docker
-- For local development without Docker, use `python flask_app.py` and `streamlit run streamlit_app.py`
-# Deploying to Hugging Face Spaces
-## Prerequisites:
-- Hugging Face account
-- Git installed
-## Deployment Steps:
-### 1. Create a new Space on Hugging Face
-- Go to https://huggingface.co/new-space
-- Choose a name for your Space
-- Select **Docker** as the SDK
-- Choose your preferred visibility (public/private)
-### 2. Clone your Space repository
-```bash
-git clone https://huggingface.co/spaces/YOUR_USERNAME/YOUR_SPACE_NAME
-cd YOUR_SPACE_NAME
-```
-### 3. Copy files to the Space repository
-Copy these files to your Space repository:
-- `Dockerfile`
-- `requirements.txt`
-- `flask_app.py`
-- `streamlit_app.py`
-- `start.sh`
-- `README.md`
-- `.dockerignore`
-### 4. Push to Hugging Face
-```bash
-git add .
-git commit -m "Initial commit: Flask + Streamlit app"
-git push
 ```
-### 5. Wait for build
-- Hugging Face will automatically build your Docker container
-- This may take 5-10 minutes
-- Monitor the build logs in your Space settings
-### 6. Access your app
-- Once built, your app will be available at:
-  `https://YOUR_USERNAME-YOUR_SPACE_NAME.hf.space`
-## Important Notes:
-✅ **Port 7860** - Hugging Face Spaces uses port 7860 by default (already configured)
-✅ **Non-root user** - Dockerfile creates user with UID 1000 (Hugging Face requirement)
-✅ **Both apps run together** - Flask runs in background, Streamlit in foreground
-✅ **README.md header** - Contains Hugging Face Space configuration:
-```yaml
----
-sdk: docker
-app_port: 7860
 ---
-```
-## Troubleshooting:
-- Check build logs in Space settings if build fails
-- Make sure all files are pushed to the repository
-- Ensure `start.sh` has execute permissions (handled in Dockerfile)

 ---
+title: Audio Sentiment Analysis
+emoji: 🎤
+colorFrom: purple
+colorTo: blue
 sdk: docker
 pinned: false
 license: mit
+short_description: Analyze emotions from audio with timeline visualization
 app_port: 7860
 ---
+# Audio Sentiment Analysis - Setup Guide
+## Quick Start
+### 1. Install Dependencies
 ```bash
+uv sync
+# or
 pip install -r requirements.txt
 ```
+### 2. Configure Environment
 ```bash
+# Copy example config
+cp .env.example .env
+# Edit .env and set your preferred model
+# Default: superb/wav2vec2-base-superb-er
 ```
+### 3. Preload Model (Recommended)
 ```bash
+# Download model before starting the app
+uv run python preload_model.py
+# This downloads ~100MB-1.3GB depending on model
+# Cached in ~/.cache/huggingface/
 ```
+### 4. Start the Application
+**Terminal 1 - Flask API:**
 ```bash
+uv run python flask_app.py
 ```
+**Terminal 2 - Streamlit Dashboard:**
 ```bash
+uv run streamlit run streamlit_app.py
 ```
+### 5. Access the App
+- **Streamlit UI:** http://localhost:8501
+- **Flask API:** http://localhost:5000
 ---
+## Available Models
+| Model | Emotions | Size | Speed | Accuracy |
+|-------|----------|------|-------|----------|
+| `superb/wav2vec2-base-superb-er` | 4 | ~100MB | ⚡⚡⚡ | ⭐⭐ |
+| `superb/hubert-large-superb-er` | 4 | ~300MB | ⚡⚡ | ⭐⭐⭐ |
+| `ehcalabres/wav2vec2-lg-xlsr` | 7 | ~1.2GB | ⚡ | ⭐⭐⭐⭐ |
+**To change model:** Edit `MODEL_NAME` in `.env` file
+---
+## Configuration Files
+- **`.env`** - Your local configuration (not in git)
+- **`.env.example`** - Template with all options
+- **`config.py`** - Loads environment variables
+- **`models_config.py`** - Model-specific settings
+---
+## Deployment
+### Hugging Face Spaces
+1. Push to HF Spaces git repository
+2. Set environment variables in Space settings
+3. Docker will build automatically
+4. Model downloads on first run (or add to Dockerfile)
+### Adding Model to Docker Image
+Edit `Dockerfile` to preload model:
+```dockerfile
+RUN python preload_model.py
 ```
+This caches the model in the image so deployment is faster.
+---
+## Troubleshooting
+### Model Download Issues
+- Check internet connection
+- Verify model name in `.env`
+- Check disk space (~2GB free recommended)
+### "Model not found" errors
+- Run `python preload_model.py` first
+- Check HuggingFace Hub is accessible
+### Slow processing
+- Use smaller model (wav2vec2-base)
+- Reduce `CHUNK_DURATION` in `.env`
+- Consider GPU if available
 ---
+## File Structure
+```
+.
+├── flask_app.py           # Flask API backend
+├── streamlit_app.py       # Streamlit dashboard
+├── audio_processor.py     # Audio processing logic
+├── config.py              # Configuration loader
+├── models_config.py       # Model definitions
+├── preload_model.py       # Model download script
+├── .env                   # Your settings (gitignored)
+├── .env.example           # Settings template
+├── requirements.txt       # Python dependencies
+├── input/                 # Example audio files
+└── uploads/               # Temporary uploads (gitignored)
+```

audio_processor.py ADDED Viewed

	@@ -0,0 +1,234 @@

+import librosa
+import numpy as np
+from transformers import pipeline
+from config import config
+from models_config import get_model_config
+import os
+class AudioEmotionProcessor:
+    """Process audio files and extract emotions using ML models"""
+    def __init__(self):
+        self.model = None
+        self.model_name = config.MODEL_NAME
+        self.chunk_duration = config.CHUNK_DURATION
+        self.sample_rate = config.SAMPLE_RATE
+        # Get model-specific configuration
+        self.model_config = get_model_config(self.model_name)
+        self.label_mapping = self.model_config.get("label_mapping", {})
+    def load_model(self):
+        """Load the emotion detection model"""
+        if self.model is None:
+            print(f"Loading model: {self.model_name}")
+            print(f"Model config: {self.model_config['description']}")
+            # Get task type from model config
+            task = self.model_config.get("task", "audio-classification")
+            try:
+                # Load model with configured task
+                self.model = pipeline(
+                    task=task,
+                    model=self.model_name
+                )
+                print("Model loaded successfully!")
+            except Exception as e:
+                print(f"Failed to load with task '{task}', trying auto-detection...")
+                try:
+                    # Fallback: Try audio-classification
+                    self.model = pipeline(
+                        "audio-classification",
+                        model=self.model_name
+                    )
+                    print("Model loaded successfully with audio-classification!")
+                except Exception as e2:
+                    print(f"Error loading model: {e2}")
+                    raise
+        return self.model
+    def load_audio(self, filepath):
+        """Load audio file and resample to target sample rate"""
+        audio, sr = librosa.load(filepath, sr=self.sample_rate)
+        return audio, sr
+    def get_audio_duration(self, audio, sr):
+        """Get duration of audio in seconds"""
+        return librosa.get_duration(y=audio, sr=sr)
+    def split_into_chunks(self, audio, sr):
+        """Split audio into fixed-duration chunks"""
+        chunk_samples = int(self.chunk_duration * sr)
+        chunks = []
+        for i in range(0, len(audio), chunk_samples):
+            chunk = audio[i:i + chunk_samples]
+            # Pad last chunk if it's shorter
+            if len(chunk) < chunk_samples:
+                chunk = np.pad(chunk, (0, chunk_samples - len(chunk)), mode='constant')
+            chunks.append(chunk)
+        return chunks
+    def predict_emotion(self, audio_chunk):
+        """Predict emotion for a single audio chunk"""
+        if self.model is None:
+            self.load_model()
+        # Get predictions
+        predictions = self.model(audio_chunk)
+        # Get top prediction
+        top_prediction = predictions[0]
+        # Debug: Print raw model output
+        print(f"DEBUG - Raw prediction: {top_prediction}")
+        # Map model output to our emotion labels
+        emotion_label = self.map_emotion_label(top_prediction['label'])
+        confidence = top_prediction['score']
+        return emotion_label, confidence
+    def map_emotion_label(self, model_label):
+        """Map model output labels to standardized emotion names"""
+        # Different models may have different label formats
+        label_lower = model_label.lower()
+        # Use model-specific label mapping first
+        if label_lower in self.label_mapping:
+            return self.label_mapping[label_lower]
+        # Fallback to common variations
+        emotion_map = {
+            'hap': 'Happy',
+            'happy': 'Happy',
+            'happiness': 'Happy',
+            'sad': 'Sad',
+            'sadness': 'Sad',
+            'ang': 'Angry',
+            'angry': 'Angry',
+            'anger': 'Angry',
+            'neu': 'Neutral',
+            'neutral': 'Neutral',
+            'calm': 'Neutral',
+            'fear': 'Fear',
+            'fearful': 'Fear',
+            'surprise': 'Surprise',
+            'surprised': 'Surprise',
+            'disgust': 'Disgust'
+        }
+        # Try to find a match
+        for key, value in emotion_map.items():
+            if key in label_lower:
+                return value
+        # Default: capitalize first letter
+        return model_label.capitalize()
+    def format_time(self, seconds):
+        """Format seconds to MM:SS format"""
+        mins = int(seconds // 60)
+        secs = int(seconds % 60)
+        return f"{mins:02d}:{secs:02d}"
+    def process_audio_file(self, filepath, progress_callback=None):
+        """
+        Process entire audio file and return emotion timeline
+        Args:
+            filepath: Path to audio file
+            progress_callback: Optional callback function(progress, message)
+        Returns:
+            dict: Results containing timeline and metadata
+        """
+        try:
+            # Load model
+            if progress_callback:
+                progress_callback(10, "Loading model...")
+            self.load_model()
+            # Load audio
+            if progress_callback:
+                progress_callback(20, "Loading audio file...")
+            audio, sr = self.load_audio(filepath)
+            # Get duration
+            duration = self.get_audio_duration(audio, sr)
+            duration_formatted = self.format_time(duration)
+            # Split into chunks
+            if progress_callback:
+                progress_callback(30, "Splitting audio into segments...")
+            chunks = self.split_into_chunks(audio, sr)
+            # Process each chunk
+            timeline = []
+            total_chunks = len(chunks)
+            for i, chunk in enumerate(chunks):
+                # Calculate progress (30% to 90%)
+                progress = 30 + int((i / total_chunks) * 60)
+                if progress_callback:
+                    progress_callback(
+                        progress,
+                        f"Analyzing chunk {i+1}/{total_chunks}..."
+                    )
+                # Predict emotion
+                emotion, confidence = self.predict_emotion(chunk)
+                # Calculate timestamp
+                time_seconds = i * self.chunk_duration
+                time_formatted = self.format_time(time_seconds)
+                timeline.append({
+                    "time": time_formatted,
+                    "emotion": emotion,
+                    "confidence": float(confidence)
+                })
+            # Calculate statistics
+            if progress_callback:
+                progress_callback(95, "Calculating statistics...")
+            emotions_list = [item['emotion'] for item in timeline]
+            unique_emotions = len(set(emotions_list))
+            # Find dominant emotion
+            from collections import Counter
+            emotion_counts = Counter(emotions_list)
+            dominant_emotion = emotion_counts.most_common(1)[0][0]
+            # Build results
+            results = {
+                "duration": duration_formatted,
+                "total_chunks": total_chunks,
+                "emotions_detected": unique_emotions,
+                "dominant_emotion": dominant_emotion,
+                "timeline": timeline
+            }
+            if progress_callback:
+                progress_callback(100, "Analysis complete!")
+            return results
+        except Exception as e:
+            raise Exception(f"Audio processing failed: {str(e)}")
+# Global processor instance
+_processor = None
+def get_processor():
+    """Get or create global processor instance"""
+    global _processor
+    if _processor is None:
+        _processor = AudioEmotionProcessor()
+    return _processor

config.py CHANGED Viewed

@@ -8,7 +8,7 @@ class Config:
     """Application configuration loaded from environment variables"""
     # Model Settings
-    MODEL_NAME = os.getenv('MODEL_NAME', 'superb/wav2vec2-base-superb-er')
     # Audio Processing Settings
     CHUNK_DURATION = int(os.getenv('CHUNK_DURATION', 3))  # seconds

     """Application configuration loaded from environment variables"""
     # Model Settings
+    MODEL_NAME = os.getenv('MODEL_NAME', 'superb/wav2vec2-large-superb-er')
     # Audio Processing Settings
     CHUNK_DURATION = int(os.getenv('CHUNK_DURATION', 3))  # seconds

flask_app.py CHANGED Viewed

@@ -6,6 +6,7 @@ from datetime import datetime
 from config import config
 from concurrent.futures import ThreadPoolExecutor
 import threading
 app = Flask(__name__)
 CORS(app)  # Enable CORS for Streamlit
@@ -17,6 +18,21 @@ executor = ThreadPoolExecutor(max_workers=4)
 jobs = {}
 jobs_lock = threading.Lock()
 # Upload folder for temporary audio files
 UPLOAD_FOLDER = 'uploads'
 os.makedirs(UPLOAD_FOLDER, exist_ok=True)
@@ -132,55 +148,35 @@ def process_audio(job_id, filepath):
     Process audio file and extract emotions
     This runs in a background thread
     """
-    import time  # For simulating processing time
     try:
-        # Update status to processing
-        with jobs_lock:
-            jobs[job_id]["status"] = "processing"
-            jobs[job_id]["progress"] = 10
-            jobs[job_id]["message"] = "Loading audio file..."
-        # Simulate some processing time
-        time.sleep(1)
-        with jobs_lock:
-            jobs[job_id]["progress"] = 30
-            jobs[job_id]["message"] = "Analyzing audio segments..."
-        # TODO: Actual audio processing logic will go here
-        # For now, return mock data
-        time.sleep(2)
         with jobs_lock:
-            jobs[job_id]["progress"] = 70
-            jobs[job_id]["message"] = "Extracting emotions..."
-        time.sleep(1)
-        # Mock results
-        results = {
-            "duration": "00:45",
-            "total_chunks": 15,
-            "emotions_detected": 4,
-            "dominant_emotion": "Happy",
-            "timeline": [
-                {"time": "00:00", "emotion": "Neutral", "confidence": 0.85},
-                {"time": "00:03", "emotion": "Happy", "confidence": 0.92},
-                {"time": "00:06", "emotion": "Happy", "confidence": 0.88},
-                {"time": "00:09", "emotion": "Sad", "confidence": 0.78},
-                {"time": "00:12", "emotion": "Neutral", "confidence": 0.90}
-            ]
-        }
         with jobs_lock:
             jobs[job_id]["progress"] = 100
             jobs[job_id]["status"] = "completed"
             jobs[job_id]["message"] = "Analysis complete!"
             jobs[job_id]["results"] = results
-        # Clean up uploaded file after processing (optional)
-        # os.remove(filepath)
     except Exception as e:
         with jobs_lock:
@@ -193,5 +189,6 @@ if __name__ == '__main__':
     app.run(
         debug=config.FLASK_DEBUG,
         host=config.FLASK_HOST,
-        port=config.FLASK_PORT
     )

 from config import config
 from concurrent.futures import ThreadPoolExecutor
 import threading
+from audio_processor import get_processor
 app = Flask(__name__)
 CORS(app)  # Enable CORS for Streamlit
 jobs = {}
 jobs_lock = threading.Lock()
+# Preload model on startup
+print("=" * 60)
+print("INITIALIZING APPLICATION...")
+print("=" * 60)
+try:
+    print("Preloading emotion detection model...")
+    processor = get_processor()
+    processor.load_model()
+    print("✅ Model preloaded successfully!")
+    print("=" * 60)
+except Exception as e:
+    print(f"⚠️ Warning: Failed to preload model: {e}")
+    print("Model will be loaded on first request.")
+    print("=" * 60)
 # Upload folder for temporary audio files
 UPLOAD_FOLDER = 'uploads'
 os.makedirs(UPLOAD_FOLDER, exist_ok=True)
     Process audio file and extract emotions
     This runs in a background thread
     """
     try:
+        # Get audio processor
+        processor = get_processor()
+        # Progress callback function
+        def update_progress(progress, message):
+            with jobs_lock:
+                jobs[job_id]["progress"] = progress
+                jobs[job_id]["message"] = message
+        # Update status to processing
         with jobs_lock:
+            jobs[job_id]["status"] = "processing"
+        # Process audio file with real ML model
+        results = processor.process_audio_file(filepath, progress_callback=update_progress)
+        # Mark as completed
         with jobs_lock:
             jobs[job_id]["progress"] = 100
             jobs[job_id]["status"] = "completed"
             jobs[job_id]["message"] = "Analysis complete!"
             jobs[job_id]["results"] = results
+        # Clean up uploaded file after processing
+        try:
+            os.remove(filepath)
+        except:
+            pass
     except Exception as e:
         with jobs_lock:
     app.run(
         debug=config.FLASK_DEBUG,
         host=config.FLASK_HOST,
+        port=config.FLASK_PORT,
+        use_reloader=False  # Disable auto-reload to prevent socket errors
     )

models_config.py ADDED Viewed

	@@ -0,0 +1,157 @@

+"""
+Configuration for different emotion detection models
+Add new models here with their specific settings
+"""
+MODELS_CONFIG = {
+    # SuperB Wav2Vec2 - Lightweight, 4 emotions
+    "superb/wav2vec2-base-superb-er": {
+        "task": "audio-classification",
+        "emotions": ["Neutral", "Happy", "Sad", "Angry"],
+        "label_mapping": {
+            "neu": "Neutral",
+            "neutral": "Neutral",
+            "hap": "Happy",
+            "happy": "Happy",
+            "sad": "Sad",
+            "sadness": "Sad",
+            "ang": "Angry",
+            "angry": "Angry",
+            "anger": "Angry"
+        },
+        "sample_rate": 16000,
+        "description": "Lightweight model with 4 basic emotions"
+    },
+    # SuperB HuBERT - Better accuracy, 4 emotions
+    "superb/hubert-large-superb-er": {
+        "task": "audio-classification",
+        "emotions": ["Neutral", "Happy", "Sad", "Angry"],
+        "label_mapping": {
+            "neu": "Neutral",
+            "neutral": "Neutral",
+            "hap": "Happy",
+            "happy": "Happy",
+            "sad": "Sad",
+            "sadness": "Sad",
+            "ang": "Angry",
+            "angry": "Angry",
+            "anger": "Angry"
+        },
+        "sample_rate": 16000,
+        "description": "HuBERT-based model with better accuracy"
+    },
+    # Ehcalabres Wav2Vec2 XLSR - 7 emotions
+    "ehcalabres/wav2vec2-lg-xlsr-en-speech-emotion-recognition": {
+        "task": "audio-classification",
+        "emotions": ["Neutral", "Happy", "Sad", "Angry", "Fear", "Disgust", "Surprise"],
+        "label_mapping": {
+            "neu": "Neutral",
+            "neutral": "Neutral",
+            "hap": "Happy",
+            "happy": "Happy",
+            "happiness": "Happy",
+            "sad": "Sad",
+            "sadness": "Sad",
+            "ang": "Angry",
+            "angry": "Angry",
+            "anger": "Angry",
+            "fea": "Fear",
+            "fear": "Fear",
+            "dis": "Disgust",
+            "disgust": "Disgust",
+            "sur": "Surprise",
+            "surprise": "Surprise"
+        },
+        "sample_rate": 16000,
+        "description": "Multi-lingual model with 7 emotions"
+    },
+    # Harshit345 XLSR - Alternative model
+    "harshit345/xlsr-wav2vec-speech-emotion-recognition": {
+        "task": "automatic-speech-recognition",  # Different task type
+        "emotions": ["Neutral", "Happy", "Sad", "Angry", "Fear", "Disgust", "Surprise"],
+        "label_mapping": {
+            "neutral": "Neutral",
+            "calm": "Neutral",
+            "happy": "Happy",
+            "sad": "Sad",
+            "angry": "Angry",
+            "fearful": "Fear",
+            "fear": "Fear",
+            "disgust": "Disgust",
+            "surprised": "Surprise",
+            "surprise": "Surprise"
+        },
+        "sample_rate": 16000,
+        "description": "XLSR-based emotion recognition",
+        "special_handling": True  # Needs custom loading
+    },
+    # Amiriparian Wav2Vec2 - RAVDESS dataset
+    "amiriparian/wav2vec2-base-ravdess": {
+        "task": "audio-classification",
+        "emotions": ["Neutral", "Happy", "Sad", "Angry", "Fear", "Disgust", "Surprise", "Calm"],
+        "label_mapping": {
+            "01": "Neutral",
+            "02": "Calm",
+            "03": "Happy",
+            "04": "Sad",
+            "05": "Angry",
+            "06": "Fear",
+            "07": "Disgust",
+            "08": "Surprise",
+            "neutral": "Neutral",
+            "calm": "Calm",
+            "happy": "Happy",
+            "sad": "Sad",
+            "angry": "Angry",
+            "fearful": "Fear",
+            "fear": "Fear",
+            "disgust": "Disgust",
+            "surprised": "Surprise",
+            "surprise": "Surprise"
+        },
+        "sample_rate": 16000,
+        "description": "Trained on RAVDESS dataset with 8 emotions"
+    }
+}
+def get_model_config(model_name):
+    """
+    Get configuration for a specific model
+    Args:
+        model_name: Name of the model
+    Returns:
+        dict: Model configuration or default config
+    """
+    if model_name in MODELS_CONFIG:
+        return MODELS_CONFIG[model_name]
+    # Default configuration for unknown models
+    return {
+        "task": "audio-classification",
+        "emotions": ["Neutral", "Happy", "Sad", "Angry"],
+        "label_mapping": {},
+        "sample_rate": 16000,
+        "description": "Custom model",
+        "special_handling": False
+    }
+def get_available_models():
+    """Get list of all available configured models"""
+    return list(MODELS_CONFIG.keys())
+def get_model_info(model_name):
+    """Get human-readable info about a model"""
+    config = get_model_config(model_name)
+    return {
+        "name": model_name,
+        "emotions": config["emotions"],
+        "num_emotions": len(config["emotions"]),
+        "description": config["description"],
+        "sample_rate": config["sample_rate"]
+    }

preload_model.py ADDED Viewed

	@@ -0,0 +1,45 @@

+#!/usr/bin/env python3
+"""
+Standalone script to preload and cache the emotion detection model
+Run this before starting the Flask app to download the model in advance
+"""
+import os
+from audio_processor import get_processor
+from config import config
+def preload_model():
+    """Download and cache the model"""
+    print("=" * 70)
+    print("MODEL PRELOAD SCRIPT")
+    print("=" * 70)
+    print(f"Model: {config.MODEL_NAME}")
+    print(f"Cache location: ~/.cache/huggingface/")
+    print("-" * 70)
+    try:
+        print("\n📥 Downloading and loading model...")
+        processor = get_processor()
+        processor.load_model()
+        print("\n✅ SUCCESS!")
+        print("=" * 70)
+        print("Model has been downloaded and cached.")
+        print("You can now start the Flask app without waiting for download.")
+        print("=" * 70)
+    except Exception as e:
+        print("\n❌ FAILED!")
+        print("=" * 70)
+        print(f"Error: {e}")
+        print("\nTroubleshooting:")
+        print("1. Check your internet connection")
+        print("2. Verify model name in .env file")
+        print("3. Ensure you have enough disk space")
+        print("=" * 70)
+        return False
+    return True
+if __name__ == "__main__":
+    preload_model()

pyproject.toml CHANGED Viewed

@@ -7,9 +7,14 @@ requires-python = ">=3.10"
 dependencies = [
     "flask>=3.1.2",
     "flask-cors>=6.0.2",
     "pandas>=2.3.3",
     "plotly>=6.5.2",
     "python-dotenv>=1.2.1",
     "requests>=2.32.5",
     "streamlit>=1.54.0",
 ]

 dependencies = [
     "flask>=3.1.2",
     "flask-cors>=6.0.2",
+    "librosa>=0.11.0",
     "pandas>=2.3.3",
     "plotly>=6.5.2",
     "python-dotenv>=1.2.1",
     "requests>=2.32.5",
+    "soundfile>=0.13.1",
     "streamlit>=1.54.0",
+    "torch>=2.10.0",
+    "torchaudio>=2.10.0",
+    "transformers>=5.1.0",
 ]

requirements.txt CHANGED Viewed

@@ -7,3 +7,8 @@ requests
 pandas
 plotly
 python-dotenv

 pandas
 plotly
 python-dotenv
+librosa
+soundfile
+transformers
+torch
+torchaudio

streamlit_app.py CHANGED Viewed

@@ -63,7 +63,7 @@ with tab1:
             st.warning("⚠️ Example file not found in input/ folder")
     # Show analyze button
-    analyze_btn = st.button("🔍 Analyze Audio", type="primary", use_container_width=True, disabled=(audio_file is None))
     # Initialize session state for results
     if 'analysis_results' not in st.session_state:
@@ -155,7 +155,7 @@ with tab1:
                             break
                     # Wait before next poll
-                    time.sleep(2)
                     attempt += 1
                 if attempt >= max_attempts:
@@ -179,12 +179,16 @@ with tab1:
         # Get results from session state
         results = st.session_state.analysis_results
-        # Emotion emoji mapping
         emotion_emoji_map = {
             'Happy': '😊',
             'Sad': '😢',
             'Angry': '😡',
-            'Neutral': '😐'
         }
         # Convert timeline to DataFrame
@@ -220,39 +224,45 @@ with tab1:
         with col1:
             st.subheader("⏱️ Emotion Timeline")
-            # Bar chart with emojis
-            fig_timeline = go.Figure()
             colors = {
                 'Happy': '#FFD700',
                 'Sad': '#4169E1',
                 'Angry': '#DC143C',
-                'Neutral': '#808080'
             }
-            for emotion in sample_timeline['Emotion'].unique():
-                emotion_data = sample_timeline[sample_timeline['Emotion'] == emotion]
-                fig_timeline.add_trace(go.Bar(
-                    x=emotion_data['Time (s)'],
-                    y=emotion_data['Confidence'],
-                    name=f"{emotion_emoji_map[emotion]} {emotion}",
-                    marker_color=colors[emotion],
-                    text=[emotion_emoji_map[emotion]] * len(emotion_data),
-                    textposition='outside',
-                    textfont=dict(size=20)
-                ))
             fig_timeline.update_layout(
                 xaxis_title="Time",
                 yaxis_title="Confidence",
                 yaxis_range=[0, 1.1],
-                barmode='group',
                 height=400,
-                showlegend=True,
-                hovermode='x unified'
             )
-            st.plotly_chart(fig_timeline, use_container_width=True)
         with col2:
             st.subheader("📊 Distribution")
@@ -274,7 +284,7 @@ with tab1:
                 showlegend=False
             )
-            st.plotly_chart(fig_pie, use_container_width=True)
         # Detailed Timeline Table
         st.subheader("📋 Detailed Timeline")
@@ -282,7 +292,7 @@ with tab1:
         display_df['Confidence'] = display_df['Confidence'].apply(lambda x: f"{x:.2%}")
         st.dataframe(
             display_df,
-            use_container_width=True,
             hide_index=True
         )
@@ -297,11 +307,11 @@ with tab2:
     col1, col2, col3 = st.columns(3)
     with col1:
-        record_btn = st.button("🔴 Start Recording", type="primary", use_container_width=True)
     with col2:
-        stop_btn = st.button("⏹️ Stop Recording", use_container_width=True)
     with col3:
-        analyze_record_btn = st.button("🔍 Analyze Recording", use_container_width=True)
     # Recording status
     if record_btn:
@@ -346,12 +356,16 @@ with tab2:
         st.markdown("---")
         st.subheader("📊 Emotion Analysis Results")
-        # Emotion emoji mapping
         emotion_emoji_map = {
             'Happy': '😊',
             'Sad': '😢',
             'Angry': '😡',
-            'Neutral': '😐'
         }
         # Sample data for recorded audio
@@ -394,7 +408,11 @@ with tab2:
                 'Happy': '#FFD700',
                 'Sad': '#4169E1',
                 'Angry': '#DC143C',
-                'Neutral': '#808080'
             }
             for emotion in sample_data['Emotion'].unique():
@@ -419,7 +437,7 @@ with tab2:
                 hovermode='x unified'
             )
-            st.plotly_chart(fig_timeline, use_container_width=True)
         with col2:
             st.subheader("📊 Distribution")
@@ -441,7 +459,7 @@ with tab2:
                 showlegend=False
             )
-            st.plotly_chart(fig_pie, use_container_width=True)
         # Detailed Timeline Table
         st.subheader("📋 Detailed Timeline")
@@ -449,7 +467,7 @@ with tab2:
         display_df['Confidence'] = display_df['Confidence'].apply(lambda x: f"{x:.2%}")
         st.dataframe(
             display_df,
-            use_container_width=True,
             hide_index=True
         )

             st.warning("⚠️ Example file not found in input/ folder")
     # Show analyze button
+    analyze_btn = st.button("🔍 Analyze Audio", type="primary", width="stretch", disabled=(audio_file is None))
     # Initialize session state for results
     if 'analysis_results' not in st.session_state:
                             break
                     # Wait before next poll
+                    time.sleep(5)
                     attempt += 1
                 if attempt >= max_attempts:
         # Get results from session state
         results = st.session_state.analysis_results
+        # Emotion emoji mapping (supports all emotions)
         emotion_emoji_map = {
             'Happy': '😊',
             'Sad': '😢',
             'Angry': '😡',
+            'Neutral': '😐',
+            'Fear': '😨',
+            'Surprise': '😲',
+            'Disgust': '🤢',
+            'Calm': '😌'
         }
         # Convert timeline to DataFrame
         with col1:
             st.subheader("⏱️ Emotion Timeline")
+            # Color mapping (supports all emotions)
             colors = {
                 'Happy': '#FFD700',
                 'Sad': '#4169E1',
                 'Angry': '#DC143C',
+                'Neutral': '#808080',
+                'Fear': '#9370DB',
+                'Surprise': '#FF8C00',
+                'Disgust': '#32CD32',
+                'Calm': '#87CEEB'
             }
+            # Create bar chart with individual bars (not grouped)
+            fig_timeline = go.Figure()
+            # Add all bars in sequence
+            bar_colors = [colors[emotion] for emotion in sample_timeline['Emotion']]
+            bar_text = [emotion_emoji_map[emotion] for emotion in sample_timeline['Emotion']]
+            fig_timeline.add_trace(go.Bar(
+                x=sample_timeline['Time (s)'],
+                y=sample_timeline['Confidence'],
+                marker_color=bar_colors,
+                text=bar_text,
+                textposition='outside',
+                textfont=dict(size=20),
+                hovertemplate='<b>%{x}</b><br>Confidence: %{y:.2%}<br><extra></extra>',
+                showlegend=False
+            ))
             fig_timeline.update_layout(
                 xaxis_title="Time",
                 yaxis_title="Confidence",
                 yaxis_range=[0, 1.1],
                 height=400,
+                hovermode='x'
             )
+            st.plotly_chart(fig_timeline, width="stretch")
         with col2:
             st.subheader("📊 Distribution")
                 showlegend=False
             )
+            st.plotly_chart(fig_pie, width="stretch")
         # Detailed Timeline Table
         st.subheader("📋 Detailed Timeline")
         display_df['Confidence'] = display_df['Confidence'].apply(lambda x: f"{x:.2%}")
         st.dataframe(
             display_df,
+            width="stretch",
             hide_index=True
         )
     col1, col2, col3 = st.columns(3)
     with col1:
+        record_btn = st.button("🔴 Start Recording", type="primary", width="stretch")
     with col2:
+        stop_btn = st.button("⏹️ Stop Recording", width="stretch")
     with col3:
+        analyze_record_btn = st.button("🔍 Analyze Recording", width="stretch")
     # Recording status
     if record_btn:
         st.markdown("---")
         st.subheader("📊 Emotion Analysis Results")
+        # Emotion emoji mapping (supports all emotions)
         emotion_emoji_map = {
             'Happy': '😊',
             'Sad': '😢',
             'Angry': '😡',
+            'Neutral': '😐',
+            'Fear': '😨',
+            'Surprise': '😲',
+            'Disgust': '🤢',
+            'Calm': '😌'
         }
         # Sample data for recorded audio
                 'Happy': '#FFD700',
                 'Sad': '#4169E1',
                 'Angry': '#DC143C',
+                'Neutral': '#808080',
+                'Fear': '#9370DB',
+                'Surprise': '#FF8C00',
+                'Disgust': '#32CD32',
+                'Calm': '#87CEEB'
             }
             for emotion in sample_data['Emotion'].unique():
                 hovermode='x unified'
             )
+            st.plotly_chart(fig_timeline, width="stretch")
         with col2:
             st.subheader("📊 Distribution")
                 showlegend=False
             )
+            st.plotly_chart(fig_pie, width="stretch")
         # Detailed Timeline Table
         st.subheader("📋 Detailed Timeline")
         display_df['Confidence'] = display_df['Confidence'].apply(lambda x: f"{x:.2%}")
         st.dataframe(
             display_df,
+            width="stretch",
             hide_index=True
         )