Faham
UPDATE: readme
93e56c4
|
raw
history blame
8.14 kB
metadata
title: Multimodal Sentiment Analysis
emoji: 🧠
colorFrom: blue
colorTo: purple
sdk: streamlit
sdk_version: 1.48.1
app_file: app.py
pinned: false

Multimodal Sentiment Analysis

A comprehensive Streamlit application that combines three different sentiment analysis models: text, audio, and vision-based sentiment analysis. The project demonstrates how to integrate multiple AI models for comprehensive sentiment understanding across different modalities.

What is it?

This project implements a fused sentiment analysis system that combines predictions from three independent models:

1. Text Sentiment Analysis

  • Model: TextBlob NLP library
  • Capability: Analyzes text input for positive, negative, or neutral sentiment
  • Status: βœ… Fully integrated and ready to use

2. Audio Sentiment Analysis

  • Model: Fine-tuned Wav2Vec2-base model
  • Training Data: RAVDESS + CREMA-D emotional speech datasets
  • Capability: Analyzes audio files and microphone recordings for sentiment
  • Features:
    • File upload support (WAV, MP3, M4A, FLAC)
    • Direct microphone recording (max 5 seconds)
    • Automatic preprocessing (16kHz sampling, 5s max duration)
  • Status: βœ… Fully integrated and ready to use

3. Vision Sentiment Analysis

  • Model: Fine-tuned ResNet-50 model
  • Training Data: FER2013 facial expression dataset
  • Capability: Analyzes images for facial expression-based sentiment
  • Features:
    • File upload support (PNG, JPG, JPEG, BMP, TIFF)
    • Camera capture functionality
    • Automatic face detection and preprocessing
    • Grayscale conversion and 224x224 resize
  • Status: βœ… Fully integrated and ready to use

4. Fused Model

  • Approach: Combines predictions from all three models
  • Capability: Provides comprehensive sentiment analysis across modalities
  • Status: βœ… Fully integrated and ready to use

Project Structure

sentiment-fused/
β”œβ”€β”€ app.py                          # Main Streamlit application
β”œβ”€β”€ simple_model_manager.py         # Model management and Google Drive integration
β”œβ”€β”€ requirements.txt                # Python dependencies
β”œβ”€β”€ pyproject.toml                 # Project configuration
β”œβ”€β”€ Dockerfile                     # Container deployment
β”œβ”€β”€ notebooks/                     # Development notebooks
β”‚   β”œβ”€β”€ audio_sentiment_analysis.ipynb    # Audio model development
β”‚   └── vision_sentiment_analysis.ipynb   # Vision model development
└── models/                        # Model storage directory

Key Features

  • Real-time Analysis: Instant sentiment predictions with confidence scores
  • Smart Preprocessing: Automatic file format handling and preprocessing
  • Multi-Page Interface: Clean navigation between different sentiment analysis modes
  • Model Management: Automatic model downloading from Google Drive
  • File Support: Multiple audio and image format support
  • Camera & Microphone: Direct input capture capabilities

Prerequisites

  • Python 3.9 or higher
  • 4GB+ RAM (for model loading)
  • Internet connection (for initial model download)

Installation

  1. Clone the repository:

    git clone <your-repo-url>
    cd sentiment-fused
    
  2. Create a virtual environment (recommended):

    python -m venv venv
    
    # On Windows
    venv\Scripts\activate
    
    # On macOS/Linux
    source venv/bin/activate
    
  3. Install dependencies:

    pip install -r requirements.txt
    
  4. Set up environment variables: Create a .env file in the project root with:

    VISION_MODEL_DRIVE_ID=your_google_drive_vision_model_file_id_here
    AUDIO_MODEL_DRIVE_ID=your_google_drive_audio_model_file_id_here
    VISION_MODEL_FILENAME=resnet50_model.pth
    AUDIO_MODEL_FILENAME=wav2vec2_model.pth
    

Running Locally

  1. Start the Streamlit application:

    streamlit run app.py
    
  2. Open your browser and navigate to the URL shown in the terminal (usually http://localhost:8501)

  3. Navigate between pages using the sidebar:

    • 🏠 Home: Overview and welcome page
    • πŸ“ Text Sentiment: Analyze text with TextBlob
    • 🎡 Audio Sentiment: Analyze audio files or record with microphone
    • πŸ–ΌοΈ Vision Sentiment: Analyze images or capture with camera
    • πŸ”— Fused Model: Combine all three models

Model Development

The project includes Jupyter notebooks that document the development process:

Audio Model (notebooks/audio_sentiment_analysis.ipynb)

  • Wav2Vec2-base fine-tuning on RAVDESS + CREMA-D datasets
  • Emotion-to-sentiment mapping (happy/surprised β†’ positive, sad/angry/fearful/disgust β†’ negative, neutral/calm β†’ neutral)
  • Audio preprocessing pipeline (16kHz sampling, 5s max duration)

Vision Model (notebooks/vision_sentiment_analysis.ipynb)

  • ResNet-50 fine-tuning on FER2013 dataset
  • Emotion-to-sentiment mapping (happy/surprise β†’ positive, angry/disgust/fear/sad β†’ negative, neutral β†’ neutral)
  • Image preprocessing pipeline (face detection, grayscale conversion, 224x224 resize)

Technical Implementation

Model Management

  • SimpleModelManager class handles model downloading from Google Drive
  • Automatic model caching and version management
  • Environment variable configuration for model URLs

Preprocessing Pipelines

  • Audio: Automatic resampling, duration limiting, feature extraction
  • Vision: Face detection, cropping, grayscale conversion, normalization
  • Text: Direct TextBlob processing

Streamlit Integration

  • Multi-page application with sidebar navigation
  • File upload widgets with format validation
  • Real-time camera and microphone input
  • Custom CSS styling for modern UI

Deployment

Docker Deployment

# Build the container
docker build -t sentiment-fused .

# Run the container
docker run -p 7860:7860 sentiment-fused

The application will be available at http://localhost:7860

Local Development

# Run with custom port
streamlit run app.py --server.port 8502

# Run with custom address
streamlit run app.py --server.address 0.0.0.0

Troubleshooting

Common Issues

  1. Model Loading Errors:

    • Ensure environment variables are set correctly
    • Check internet connection for model downloads
    • Verify sufficient RAM (4GB+ recommended)
  2. Dependency Issues:

    • Use virtual environment to avoid conflicts
    • Install PyTorch with CUDA support if using GPU
    • Ensure OpenCV is properly installed for face detection
  3. Performance Issues:

    • Large audio/image files may cause memory issues
    • Consider file size limits for better performance
    • GPU acceleration available for PyTorch models

Model Testing

# Test vision model
python -c "from simple_model_manager import SimpleModelManager; m = SimpleModelManager(); print('Vision model:', m.load_vision_model()[0] is not None)"

# Test audio model
python -c "from simple_model_manager import SimpleModelManager; m = SimpleModelManager(); print('Audio model:', m.load_audio_model()[0] is not None)"

Dependencies

Key libraries used:

  • Streamlit: Web application framework
  • PyTorch: Deep learning framework
  • Transformers: Hugging Face model library
  • OpenCV: Computer vision and face detection
  • Librosa: Audio processing
  • TextBlob: Natural language processing
  • Gdown: Google Drive file downloader

What This Project Demonstrates

  1. Multimodal AI Integration: Combining text, audio, and vision models
  2. Model Management: Automated downloading and caching of pre-trained models
  3. Real-time Processing: Live audio recording and camera capture
  4. Smart Preprocessing: Automatic format conversion and optimization
  5. Modern Web UI: Professional Streamlit application with custom styling
  6. Production Ready: Docker containerization and deployment

This project serves as a comprehensive example of building production-ready multimodal AI applications with modern Python tools and frameworks.