Spaces:

abedir
/

hubert_emotions

Runtime error

App Files Files Community

abedir commited on Feb 5

Commit

5c2da15

verified ·

1 Parent(s): cd1b383

Upload 4 files

Browse files

Files changed (4) hide show

Dockerfile +35 -0
README.md +250 -9
app.py +299 -0
requirements.txt +11 -0

Dockerfile ADDED Viewed

	@@ -0,0 +1,35 @@

+FROM python:3.10-slim
+# Set working directory
+WORKDIR /app
+# Install system dependencies
+RUN apt-get update && apt-get install -y \
+    build-essential \
+    libsndfile1 \
+    ffmpeg \
+    git \
+    && rm -rf /var/lib/apt/lists/*
+# Copy requirements first for better caching
+COPY requirements.txt .
+# Install Python dependencies
+RUN pip install --no-cache-dir -r requirements.txt
+# Copy application code
+COPY app.py .
+# Create model directory
+RUN mkdir -p /app/model
+# Expose port
+EXPOSE 7860
+# Set environment variables
+ENV PYTHONUNBUFFERED=1
+ENV GRADIO_SERVER_NAME=0.0.0.0
+ENV GRADIO_SERVER_PORT=7860
+# Run the application
+CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]

README.md CHANGED Viewed

@@ -1,11 +1,252 @@
----
-title: Hubert Emotions
-emoji: 📚
-colorFrom: blue
-colorTo: pink
-sdk: docker
-pinned: false
-license: other
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# 🎭 Emotion Recognition API
+A FastAPI-based emotion recognition system using HuBERT (Hidden-Unit BERT) for audio emotion classification.
+## 📋 Features
+- **Real-time Emotion Detection**: Analyze audio files and detect emotions
+- **Multiple Format Support**: WAV, MP3, FLAC, OGG, M4A
+- **Batch Processing**: Process multiple audio files at once
+- **RESTful API**: Easy integration with any application
+- **High Accuracy**: Fine-tuned HuBERT model for emotion classification
+## 🎯 Supported Emotions
+- Angry/Disgust
+- Happy/Surprised
+- Neutral/Calm
+- Sad/Fearful
+## 🚀 Quick Start
+### Using the API
+1. **Single Prediction**
+```bash
+curl -X POST "http://your-space-url/predict" \
+  -F "file=@your_audio.wav"
+```
+2. **Batch Prediction**
+```bash
+curl -X POST "http://your-space-url/predict_batch" \
+  -F "files=@audio1.wav" \
+  -F "files=@audio2.wav"
+```
+3. **Get Available Labels**
+```bash
+curl "http://your-space-url/labels"
+```
+4. **Health Check**
+```bash
+curl "http://your-space-url/health"
+```
+## 📖 API Documentation
+Once deployed, visit `/docs` for interactive API documentation (Swagger UI).
+### Endpoints
+#### `POST /predict`
+Upload a single audio file for emotion prediction.
+**Request:**
+- Form data with `file` parameter (audio file)
+**Response:**
+```json
+{
+  "success": true,
+  "predicted_emotion": "Happy/Surprised",
+  "confidence": 0.8542,
+  "all_probabilities": {
+    "Angry/Disgust": 0.0234,
+    "Happy/Surprised": 0.8542,
+    "Neutral/Calm": 0.0891,
+    "Sad/Fearful": 0.0333
+  },
+  "filename": "sample.wav"
+}
+```
+#### `POST /predict_batch`
+Upload multiple audio files (max 10) for batch prediction.
+**Request:**
+- Form data with multiple `files` parameters
+**Response:**
+```json
+{
+  "success": true,
+  "results": [
+    {
+      "filename": "audio1.wav",
+      "predicted_emotion": "Happy/Surprised",
+      "confidence": 0.8542
+    },
+    {
+      "filename": "audio2.wav",
+      "predicted_emotion": "Sad/Fearful",
+      "confidence": 0.7231
+    }
+  ],
+  "total_files": 2
+}
+```
+#### `GET /labels`
+Get all available emotion labels.
+#### `GET /health`
+Check API health status.
+## 🔧 Setup Instructions
+### Prerequisites
+- Python 3.10+
+- Your trained HuBERT model files
+### Local Development
+1. **Clone the repository**
+```bash
+git clone <your-repo>
+cd <repo-name>
+```
+2. **Install dependencies**
+```bash
+pip install -r requirements.txt
+```
+3. **Add your model**
+Place your trained model files in the `model/` directory:
+```
+model/
+├── config.json
+├── preprocessor_config.json
+├── pytorch_model.bin
+└── (other model files)
+```
+4. **Run the server**
+```bash
+uvicorn app:app --host 0.0.0.0 --port 7860
+```
+5. **Test the API**
+Visit `http://localhost:7860/docs` for interactive documentation.
+### Deploying to Hugging Face Spaces
+1. **Create a new Space**
+   - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
+   - Click "Create new Space"
+   - Choose "Docker" as the SDK
+   - Name your Space
+2. **Upload files**
+   Upload the following files to your Space:
+   - `app.py`
+   - `requirements.txt`
+   - `Dockerfile`
+   - `README.md`
+   - Your `model/` directory with all model files
+3. **Configure Space**
+   - The Space will automatically build using the Dockerfile
+   - Once built, your API will be available at `https://your-username-space-name.hf.space`
+## 📦 Model Files Required
+Make sure your `model/` directory contains:
+- `config.json` - Model configuration
+- `preprocessor_config.json` - Feature extractor configuration
+- `pytorch_model.bin` - Model weights
+- Any other files saved by `save_pretrained()`
+## 🐍 Python Client Example
+```python
+import requests
+# Predict emotion from audio file
+url = "http://your-space-url/predict"
+files = {"file": open("audio.wav", "rb")}
+response = requests.post(url, files=files)
+result = response.json()
+print(f"Emotion: {result['predicted_emotion']}")
+print(f"Confidence: {result['confidence']}")
+print(f"All probabilities: {result['all_probabilities']}")
+```
+## 🔍 JavaScript/TypeScript Example
+```javascript
+const formData = new FormData();
+formData.append('file', audioFile);
+const response = await fetch('http://your-space-url/predict', {
+  method: 'POST',
+  body: formData
+});
+const result = await response.json();
+console.log('Emotion:', result.predicted_emotion);
+console.log('Confidence:', result.confidence);
+```
+## ⚙️ Configuration
+You can modify the following in `app.py`:
+- **EMOTION_LABELS**: Update emotion label mappings
+- **max_duration**: Change audio duration limit (default: 3 seconds)
+- **Batch size limit**: Modify maximum files per batch request
+## 📊 Performance
+- **Inference Time**: ~100-300ms per audio file (CPU)
+- **Inference Time**: ~50-100ms per audio file (GPU)
+- **Supported Audio Length**: Up to 3 seconds (configurable)
+- **Concurrent Requests**: Supports multiple simultaneous requests
+## 🛠️ Troubleshooting
+### Common Issues
+1. **Model not loading**
+   - Ensure all model files are in the `model/` directory
+   - Check that file paths in `app.py` match your structure
+2. **Audio processing errors**
+   - Verify audio file format is supported
+   - Check that librosa and soundfile are installed correctly
+3. **Out of memory**
+   - Reduce batch size
+   - Use smaller audio files
+   - Enable CPU-only mode if GPU memory is limited
+## 📝 License
+This project is licensed under the MIT License.
+## 🙏 Acknowledgments
+- HuBERT model by Facebook AI Research
+- Transformers library by Hugging Face
+- FastAPI framework
+## 📧 Contact
+For questions or issues, please open an issue on GitHub or contact [your-email].
 ---
+**Note**: Make sure to replace `your-space-url`, `your-username`, and other placeholders with your actual information.

app.py ADDED Viewed

	@@ -0,0 +1,299 @@

+from fastapi import FastAPI, File, UploadFile, HTTPException
+from fastapi.responses import JSONResponse
+from fastapi.middleware.cors import CORSMiddleware
+import torch
+import librosa
+import numpy as np
+from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
+import io
+import tempfile
+import os
+from typing import Dict
+import logging
+# Configure logging
+logging.basicConfig(level=logging.INFO)
+logger = logging.getLogger(__name__)
+# Initialize FastAPI app
+app = FastAPI(
+    title="Emotion Recognition API",
+    description="Audio emotion recognition using HuBERT model",
+    version="1.0.0"
+)
+# Add CORS middleware
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Global variables for model and processor
+model = None
+processor = None
+label_map = None
+inverse_label_map = None
+# Emotion labels (update based on your training)
+EMOTION_LABELS = {
+    0: "Angry/Fearful",
+    1: "Happy/Laugh",
+    2: "Neutral/Calm",
+    3: "Sad/Cry",
+    4: "Surprised/Amazed"
+}
+def load_model():
+    """Load the model and processor on startup"""
+    global model, processor, label_map, inverse_label_map
+    try:
+        logger.info("Loading model and processor...")
+        # Load processor and model from the saved directory
+        model_path = "./model"
+        processor = AutoFeatureExtractor.from_pretrained(model_path)
+        model = AutoModelForAudioClassification.from_pretrained(model_path)
+        # Set model to evaluation mode
+        model.eval()
+        # Move to GPU if available
+        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+        model.to(device)
+        # Create label mappings
+        label_map = EMOTION_LABELS
+        inverse_label_map = {v: k for k, v in label_map.items()}
+        logger.info(f"Model loaded successfully on {device}")
+        logger.info(f"Labels: {label_map}")
+    except Exception as e:
+        logger.error(f"Error loading model: {str(e)}")
+        raise
+@app.on_event("startup")
+async def startup_event():
+    """Load model when the application starts"""
+    load_model()
+@app.get("/")
+async def root():
+    """Root endpoint with API information"""
+    return {
+        "message": "Emotion Recognition API",
+        "status": "running",
+        "model": "HuBERT",
+        "endpoints": {
+            "/predict": "POST - Upload audio file for emotion prediction",
+            "/health": "GET - Health check",
+            "/labels": "GET - Get available emotion labels"
+        }
+    }
+@app.get("/health")
+async def health_check():
+    """Health check endpoint"""
+    return {
+        "status": "healthy",
+        "model_loaded": model is not None,
+        "processor_loaded": processor is not None,
+        "device": str(next(model.parameters()).device) if model else "not loaded"
+    }
+@app.get("/labels")
+async def get_labels():
+    """Get available emotion labels"""
+    return {
+        "labels": label_map,
+        "count": len(label_map)
+    }
+def preprocess_audio(audio_bytes: bytes, max_duration: float = 3.0) -> np.ndarray:
+    """
+    Preprocess audio file for model inference
+    Args:
+        audio_bytes: Raw audio file bytes
+        max_duration: Maximum duration in seconds
+    Returns:
+        Preprocessed audio array
+    """
+    try:
+        # Save bytes to temporary file
+        with tempfile.NamedTemporaryFile(delete=False, suffix='.wav') as temp_file:
+            temp_file.write(audio_bytes)
+            temp_path = temp_file.name
+        # Load audio with librosa
+        speech, sr = librosa.load(temp_path, sr=processor.sampling_rate)
+        # Remove temporary file
+        os.unlink(temp_path)
+        # Calculate max length
+        max_length = int(max_duration * processor.sampling_rate)
+        # Normalize duration
+        if len(speech) > max_length:
+            speech = speech[:max_length]
+        else:
+            speech = np.pad(speech, (0, max_length - len(speech)))
+        return speech
+    except Exception as e:
+        logger.error(f"Error preprocessing audio: {str(e)}")
+        raise HTTPException(status_code=400, detail=f"Error processing audio file: {str(e)}")
+@app.post("/predict")
+async def predict_emotion(file: UploadFile = File(...)):
+    """
+    Predict emotion from uploaded audio file
+    Args:
+        file: Audio file (WAV format recommended)
+    Returns:
+        JSON with predicted emotion and confidence scores
+    """
+    try:
+        # Validate file type
+        if not file.filename.lower().endswith(('.wav', '.mp3', '.flac', '.ogg', '.m4a')):
+            raise HTTPException(
+                status_code=400,
+                detail="Invalid file format. Please upload audio file (WAV, MP3, FLAC, OGG, M4A)"
+            )
+        # Read file content
+        audio_bytes = await file.read()
+        # Preprocess audio
+        speech = preprocess_audio(audio_bytes)
+        # Process with feature extractor
+        inputs = processor(
+            speech,
+            sampling_rate=processor.sampling_rate,
+            return_tensors="pt",
+            padding=True
+        )
+        # Move inputs to same device as model
+        device = next(model.parameters()).device
+        inputs = {k: v.to(device) for k, v in inputs.items()}
+        # Perform inference
+        with torch.no_grad():
+            outputs = model(**inputs)
+            logits = outputs.logits
+            # Get probabilities
+            probs = torch.nn.functional.softmax(logits, dim=-1)
+            # Get prediction
+            predicted_class = torch.argmax(probs, dim=-1).item()
+            confidence = probs[0][predicted_class].item()
+            # Get all probabilities
+            all_probs = {
+                label_map[i]: float(probs[0][i].item())
+                for i in range(len(label_map))
+            }
+        # Prepare response
+        response = {
+            "success": True,
+            "predicted_emotion": label_map[predicted_class],
+            "confidence": round(confidence, 4),
+            "all_probabilities": {k: round(v, 4) for k, v in all_probs.items()},
+            "filename": file.filename
+        }
+        logger.info(f"Prediction: {label_map[predicted_class]} (confidence: {confidence:.4f})")
+        return JSONResponse(content=response)
+    except HTTPException:
+        raise
+    except Exception as e:
+        logger.error(f"Error during prediction: {str(e)}")
+        raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")
+@app.post("/predict_batch")
+async def predict_batch(files: list[UploadFile] = File(...)):
+    """
+    Predict emotions for multiple audio files
+    Args:
+        files: List of audio files
+    Returns:
+        JSON with predictions for all files
+    """
+    if len(files) > 10:
+        raise HTTPException(
+            status_code=400,
+            detail="Maximum 10 files allowed per batch request"
+        )
+    results = []
+    for file in files:
+        try:
+            # Process each file
+            audio_bytes = await file.read()
+            speech = preprocess_audio(audio_bytes)
+            inputs = processor(
+                speech,
+                sampling_rate=processor.sampling_rate,
+                return_tensors="pt",
+                padding=True
+            )
+            device = next(model.parameters()).device
+            inputs = {k: v.to(device) for k, v in inputs.items()}
+            with torch.no_grad():
+                outputs = model(**inputs)
+                logits = outputs.logits
+                probs = torch.nn.functional.softmax(logits, dim=-1)
+                predicted_class = torch.argmax(probs, dim=-1).item()
+                confidence = probs[0][predicted_class].item()
+            results.append({
+                "filename": file.filename,
+                "predicted_emotion": label_map[predicted_class],
+                "confidence": round(confidence, 4)
+            })
+        except Exception as e:
+            results.append({
+                "filename": file.filename,
+                "error": str(e)
+            })
+    return JSONResponse(content={
+        "success": True,
+        "results": results,
+        "total_files": len(files)
+    })
+if __name__ == "__main__":
+    import uvicorn
+    uvicorn.run(app, host="0.0.0.0", port=7860)

requirements.txt ADDED Viewed

	@@ -0,0 +1,11 @@

+fastapi==0.104.1
+uvicorn[standard]==0.24.0
+python-multipart==0.0.6
+transformers==4.35.2
+torch==2.1.0
+torchaudio==2.1.0
+librosa==0.10.1
+numpy==1.24.3
+soundfile==0.12.1
+scipy==1.11.3
+numba==0.58.1