abedir commited on
Commit
5c2da15
·
verified ·
1 Parent(s): cd1b383

Upload 4 files

Browse files
Files changed (4) hide show
  1. Dockerfile +35 -0
  2. README.md +250 -9
  3. app.py +299 -0
  4. requirements.txt +11 -0
Dockerfile ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.10-slim
2
+
3
+ # Set working directory
4
+ WORKDIR /app
5
+
6
+ # Install system dependencies
7
+ RUN apt-get update && apt-get install -y \
8
+ build-essential \
9
+ libsndfile1 \
10
+ ffmpeg \
11
+ git \
12
+ && rm -rf /var/lib/apt/lists/*
13
+
14
+ # Copy requirements first for better caching
15
+ COPY requirements.txt .
16
+
17
+ # Install Python dependencies
18
+ RUN pip install --no-cache-dir -r requirements.txt
19
+
20
+ # Copy application code
21
+ COPY app.py .
22
+
23
+ # Create model directory
24
+ RUN mkdir -p /app/model
25
+
26
+ # Expose port
27
+ EXPOSE 7860
28
+
29
+ # Set environment variables
30
+ ENV PYTHONUNBUFFERED=1
31
+ ENV GRADIO_SERVER_NAME=0.0.0.0
32
+ ENV GRADIO_SERVER_PORT=7860
33
+
34
+ # Run the application
35
+ CMD ["uvicorn", "app:app", "--host", "0.0.0.0", "--port", "7860"]
README.md CHANGED
@@ -1,11 +1,252 @@
1
- ---
2
- title: Hubert Emotions
3
- emoji: 📚
4
- colorFrom: blue
5
- colorTo: pink
6
- sdk: docker
7
- pinned: false
8
- license: other
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
  ---
10
 
11
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
1
+ # 🎭 Emotion Recognition API
2
+
3
+ A FastAPI-based emotion recognition system using HuBERT (Hidden-Unit BERT) for audio emotion classification.
4
+
5
+ ## 📋 Features
6
+
7
+ - **Real-time Emotion Detection**: Analyze audio files and detect emotions
8
+ - **Multiple Format Support**: WAV, MP3, FLAC, OGG, M4A
9
+ - **Batch Processing**: Process multiple audio files at once
10
+ - **RESTful API**: Easy integration with any application
11
+ - **High Accuracy**: Fine-tuned HuBERT model for emotion classification
12
+
13
+ ## 🎯 Supported Emotions
14
+
15
+ - Angry/Disgust
16
+ - Happy/Surprised
17
+ - Neutral/Calm
18
+ - Sad/Fearful
19
+
20
+ ## 🚀 Quick Start
21
+
22
+ ### Using the API
23
+
24
+ 1. **Single Prediction**
25
+ ```bash
26
+ curl -X POST "http://your-space-url/predict" \
27
+ -F "file=@your_audio.wav"
28
+ ```
29
+
30
+ 2. **Batch Prediction**
31
+ ```bash
32
+ curl -X POST "http://your-space-url/predict_batch" \
33
+ -F "files=@audio1.wav" \
34
+ -F "files=@audio2.wav"
35
+ ```
36
+
37
+ 3. **Get Available Labels**
38
+ ```bash
39
+ curl "http://your-space-url/labels"
40
+ ```
41
+
42
+ 4. **Health Check**
43
+ ```bash
44
+ curl "http://your-space-url/health"
45
+ ```
46
+
47
+ ## 📖 API Documentation
48
+
49
+ Once deployed, visit `/docs` for interactive API documentation (Swagger UI).
50
+
51
+ ### Endpoints
52
+
53
+ #### `POST /predict`
54
+ Upload a single audio file for emotion prediction.
55
+
56
+ **Request:**
57
+ - Form data with `file` parameter (audio file)
58
+
59
+ **Response:**
60
+ ```json
61
+ {
62
+ "success": true,
63
+ "predicted_emotion": "Happy/Surprised",
64
+ "confidence": 0.8542,
65
+ "all_probabilities": {
66
+ "Angry/Disgust": 0.0234,
67
+ "Happy/Surprised": 0.8542,
68
+ "Neutral/Calm": 0.0891,
69
+ "Sad/Fearful": 0.0333
70
+ },
71
+ "filename": "sample.wav"
72
+ }
73
+ ```
74
+
75
+ #### `POST /predict_batch`
76
+ Upload multiple audio files (max 10) for batch prediction.
77
+
78
+ **Request:**
79
+ - Form data with multiple `files` parameters
80
+
81
+ **Response:**
82
+ ```json
83
+ {
84
+ "success": true,
85
+ "results": [
86
+ {
87
+ "filename": "audio1.wav",
88
+ "predicted_emotion": "Happy/Surprised",
89
+ "confidence": 0.8542
90
+ },
91
+ {
92
+ "filename": "audio2.wav",
93
+ "predicted_emotion": "Sad/Fearful",
94
+ "confidence": 0.7231
95
+ }
96
+ ],
97
+ "total_files": 2
98
+ }
99
+ ```
100
+
101
+ #### `GET /labels`
102
+ Get all available emotion labels.
103
+
104
+ #### `GET /health`
105
+ Check API health status.
106
+
107
+ ## 🔧 Setup Instructions
108
+
109
+ ### Prerequisites
110
+ - Python 3.10+
111
+ - Your trained HuBERT model files
112
+
113
+ ### Local Development
114
+
115
+ 1. **Clone the repository**
116
+ ```bash
117
+ git clone <your-repo>
118
+ cd <repo-name>
119
+ ```
120
+
121
+ 2. **Install dependencies**
122
+ ```bash
123
+ pip install -r requirements.txt
124
+ ```
125
+
126
+ 3. **Add your model**
127
+ Place your trained model files in the `model/` directory:
128
+ ```
129
+ model/
130
+ ├── config.json
131
+ ├── preprocessor_config.json
132
+ ├── pytorch_model.bin
133
+ └── (other model files)
134
+ ```
135
+
136
+ 4. **Run the server**
137
+ ```bash
138
+ uvicorn app:app --host 0.0.0.0 --port 7860
139
+ ```
140
+
141
+ 5. **Test the API**
142
+ Visit `http://localhost:7860/docs` for interactive documentation.
143
+
144
+ ### Deploying to Hugging Face Spaces
145
+
146
+ 1. **Create a new Space**
147
+ - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
148
+ - Click "Create new Space"
149
+ - Choose "Docker" as the SDK
150
+ - Name your Space
151
+
152
+ 2. **Upload files**
153
+ Upload the following files to your Space:
154
+ - `app.py`
155
+ - `requirements.txt`
156
+ - `Dockerfile`
157
+ - `README.md`
158
+ - Your `model/` directory with all model files
159
+
160
+ 3. **Configure Space**
161
+ - The Space will automatically build using the Dockerfile
162
+ - Once built, your API will be available at `https://your-username-space-name.hf.space`
163
+
164
+ ## 📦 Model Files Required
165
+
166
+ Make sure your `model/` directory contains:
167
+ - `config.json` - Model configuration
168
+ - `preprocessor_config.json` - Feature extractor configuration
169
+ - `pytorch_model.bin` - Model weights
170
+ - Any other files saved by `save_pretrained()`
171
+
172
+ ## 🐍 Python Client Example
173
+
174
+ ```python
175
+ import requests
176
+
177
+ # Predict emotion from audio file
178
+ url = "http://your-space-url/predict"
179
+ files = {"file": open("audio.wav", "rb")}
180
+ response = requests.post(url, files=files)
181
+ result = response.json()
182
+
183
+ print(f"Emotion: {result['predicted_emotion']}")
184
+ print(f"Confidence: {result['confidence']}")
185
+ print(f"All probabilities: {result['all_probabilities']}")
186
+ ```
187
+
188
+ ## 🔍 JavaScript/TypeScript Example
189
+
190
+ ```javascript
191
+ const formData = new FormData();
192
+ formData.append('file', audioFile);
193
+
194
+ const response = await fetch('http://your-space-url/predict', {
195
+ method: 'POST',
196
+ body: formData
197
+ });
198
+
199
+ const result = await response.json();
200
+ console.log('Emotion:', result.predicted_emotion);
201
+ console.log('Confidence:', result.confidence);
202
+ ```
203
+
204
+ ## ⚙️ Configuration
205
+
206
+ You can modify the following in `app.py`:
207
+
208
+ - **EMOTION_LABELS**: Update emotion label mappings
209
+ - **max_duration**: Change audio duration limit (default: 3 seconds)
210
+ - **Batch size limit**: Modify maximum files per batch request
211
+
212
+ ## 📊 Performance
213
+
214
+ - **Inference Time**: ~100-300ms per audio file (CPU)
215
+ - **Inference Time**: ~50-100ms per audio file (GPU)
216
+ - **Supported Audio Length**: Up to 3 seconds (configurable)
217
+ - **Concurrent Requests**: Supports multiple simultaneous requests
218
+
219
+ ## 🛠️ Troubleshooting
220
+
221
+ ### Common Issues
222
+
223
+ 1. **Model not loading**
224
+ - Ensure all model files are in the `model/` directory
225
+ - Check that file paths in `app.py` match your structure
226
+
227
+ 2. **Audio processing errors**
228
+ - Verify audio file format is supported
229
+ - Check that librosa and soundfile are installed correctly
230
+
231
+ 3. **Out of memory**
232
+ - Reduce batch size
233
+ - Use smaller audio files
234
+ - Enable CPU-only mode if GPU memory is limited
235
+
236
+ ## 📝 License
237
+
238
+ This project is licensed under the MIT License.
239
+
240
+ ## 🙏 Acknowledgments
241
+
242
+ - HuBERT model by Facebook AI Research
243
+ - Transformers library by Hugging Face
244
+ - FastAPI framework
245
+
246
+ ## 📧 Contact
247
+
248
+ For questions or issues, please open an issue on GitHub or contact [your-email].
249
+
250
  ---
251
 
252
+ **Note**: Make sure to replace `your-space-url`, `your-username`, and other placeholders with your actual information.
app.py ADDED
@@ -0,0 +1,299 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, File, UploadFile, HTTPException
2
+ from fastapi.responses import JSONResponse
3
+ from fastapi.middleware.cors import CORSMiddleware
4
+ import torch
5
+ import librosa
6
+ import numpy as np
7
+ from transformers import AutoFeatureExtractor, AutoModelForAudioClassification
8
+ import io
9
+ import tempfile
10
+ import os
11
+ from typing import Dict
12
+ import logging
13
+
14
+ # Configure logging
15
+ logging.basicConfig(level=logging.INFO)
16
+ logger = logging.getLogger(__name__)
17
+
18
+ # Initialize FastAPI app
19
+ app = FastAPI(
20
+ title="Emotion Recognition API",
21
+ description="Audio emotion recognition using HuBERT model",
22
+ version="1.0.0"
23
+ )
24
+
25
+ # Add CORS middleware
26
+ app.add_middleware(
27
+ CORSMiddleware,
28
+ allow_origins=["*"],
29
+ allow_credentials=True,
30
+ allow_methods=["*"],
31
+ allow_headers=["*"],
32
+ )
33
+
34
+ # Global variables for model and processor
35
+ model = None
36
+ processor = None
37
+ label_map = None
38
+ inverse_label_map = None
39
+
40
+ # Emotion labels (update based on your training)
41
+ EMOTION_LABELS = {
42
+ 0: "Angry/Fearful",
43
+ 1: "Happy/Laugh",
44
+ 2: "Neutral/Calm",
45
+ 3: "Sad/Cry",
46
+ 4: "Surprised/Amazed"
47
+ }
48
+
49
+
50
+ def load_model():
51
+ """Load the model and processor on startup"""
52
+ global model, processor, label_map, inverse_label_map
53
+
54
+ try:
55
+ logger.info("Loading model and processor...")
56
+
57
+ # Load processor and model from the saved directory
58
+ model_path = "./model"
59
+ processor = AutoFeatureExtractor.from_pretrained(model_path)
60
+ model = AutoModelForAudioClassification.from_pretrained(model_path)
61
+
62
+ # Set model to evaluation mode
63
+ model.eval()
64
+
65
+ # Move to GPU if available
66
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
67
+ model.to(device)
68
+
69
+ # Create label mappings
70
+ label_map = EMOTION_LABELS
71
+ inverse_label_map = {v: k for k, v in label_map.items()}
72
+
73
+ logger.info(f"Model loaded successfully on {device}")
74
+ logger.info(f"Labels: {label_map}")
75
+
76
+ except Exception as e:
77
+ logger.error(f"Error loading model: {str(e)}")
78
+ raise
79
+
80
+
81
+ @app.on_event("startup")
82
+ async def startup_event():
83
+ """Load model when the application starts"""
84
+ load_model()
85
+
86
+
87
+ @app.get("/")
88
+ async def root():
89
+ """Root endpoint with API information"""
90
+ return {
91
+ "message": "Emotion Recognition API",
92
+ "status": "running",
93
+ "model": "HuBERT",
94
+ "endpoints": {
95
+ "/predict": "POST - Upload audio file for emotion prediction",
96
+ "/health": "GET - Health check",
97
+ "/labels": "GET - Get available emotion labels"
98
+ }
99
+ }
100
+
101
+
102
+ @app.get("/health")
103
+ async def health_check():
104
+ """Health check endpoint"""
105
+ return {
106
+ "status": "healthy",
107
+ "model_loaded": model is not None,
108
+ "processor_loaded": processor is not None,
109
+ "device": str(next(model.parameters()).device) if model else "not loaded"
110
+ }
111
+
112
+
113
+ @app.get("/labels")
114
+ async def get_labels():
115
+ """Get available emotion labels"""
116
+ return {
117
+ "labels": label_map,
118
+ "count": len(label_map)
119
+ }
120
+
121
+
122
+ def preprocess_audio(audio_bytes: bytes, max_duration: float = 3.0) -> np.ndarray:
123
+ """
124
+ Preprocess audio file for model inference
125
+
126
+ Args:
127
+ audio_bytes: Raw audio file bytes
128
+ max_duration: Maximum duration in seconds
129
+
130
+ Returns:
131
+ Preprocessed audio array
132
+ """
133
+ try:
134
+ # Save bytes to temporary file
135
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.wav') as temp_file:
136
+ temp_file.write(audio_bytes)
137
+ temp_path = temp_file.name
138
+
139
+ # Load audio with librosa
140
+ speech, sr = librosa.load(temp_path, sr=processor.sampling_rate)
141
+
142
+ # Remove temporary file
143
+ os.unlink(temp_path)
144
+
145
+ # Calculate max length
146
+ max_length = int(max_duration * processor.sampling_rate)
147
+
148
+ # Normalize duration
149
+ if len(speech) > max_length:
150
+ speech = speech[:max_length]
151
+ else:
152
+ speech = np.pad(speech, (0, max_length - len(speech)))
153
+
154
+ return speech
155
+
156
+ except Exception as e:
157
+ logger.error(f"Error preprocessing audio: {str(e)}")
158
+ raise HTTPException(status_code=400, detail=f"Error processing audio file: {str(e)}")
159
+
160
+
161
+ @app.post("/predict")
162
+ async def predict_emotion(file: UploadFile = File(...)):
163
+ """
164
+ Predict emotion from uploaded audio file
165
+
166
+ Args:
167
+ file: Audio file (WAV format recommended)
168
+
169
+ Returns:
170
+ JSON with predicted emotion and confidence scores
171
+ """
172
+ try:
173
+ # Validate file type
174
+ if not file.filename.lower().endswith(('.wav', '.mp3', '.flac', '.ogg', '.m4a')):
175
+ raise HTTPException(
176
+ status_code=400,
177
+ detail="Invalid file format. Please upload audio file (WAV, MP3, FLAC, OGG, M4A)"
178
+ )
179
+
180
+ # Read file content
181
+ audio_bytes = await file.read()
182
+
183
+ # Preprocess audio
184
+ speech = preprocess_audio(audio_bytes)
185
+
186
+ # Process with feature extractor
187
+ inputs = processor(
188
+ speech,
189
+ sampling_rate=processor.sampling_rate,
190
+ return_tensors="pt",
191
+ padding=True
192
+ )
193
+
194
+ # Move inputs to same device as model
195
+ device = next(model.parameters()).device
196
+ inputs = {k: v.to(device) for k, v in inputs.items()}
197
+
198
+ # Perform inference
199
+ with torch.no_grad():
200
+ outputs = model(**inputs)
201
+ logits = outputs.logits
202
+
203
+ # Get probabilities
204
+ probs = torch.nn.functional.softmax(logits, dim=-1)
205
+
206
+ # Get prediction
207
+ predicted_class = torch.argmax(probs, dim=-1).item()
208
+ confidence = probs[0][predicted_class].item()
209
+
210
+ # Get all probabilities
211
+ all_probs = {
212
+ label_map[i]: float(probs[0][i].item())
213
+ for i in range(len(label_map))
214
+ }
215
+
216
+ # Prepare response
217
+ response = {
218
+ "success": True,
219
+ "predicted_emotion": label_map[predicted_class],
220
+ "confidence": round(confidence, 4),
221
+ "all_probabilities": {k: round(v, 4) for k, v in all_probs.items()},
222
+ "filename": file.filename
223
+ }
224
+
225
+ logger.info(f"Prediction: {label_map[predicted_class]} (confidence: {confidence:.4f})")
226
+
227
+ return JSONResponse(content=response)
228
+
229
+ except HTTPException:
230
+ raise
231
+ except Exception as e:
232
+ logger.error(f"Error during prediction: {str(e)}")
233
+ raise HTTPException(status_code=500, detail=f"Internal server error: {str(e)}")
234
+
235
+
236
+ @app.post("/predict_batch")
237
+ async def predict_batch(files: list[UploadFile] = File(...)):
238
+ """
239
+ Predict emotions for multiple audio files
240
+
241
+ Args:
242
+ files: List of audio files
243
+
244
+ Returns:
245
+ JSON with predictions for all files
246
+ """
247
+ if len(files) > 10:
248
+ raise HTTPException(
249
+ status_code=400,
250
+ detail="Maximum 10 files allowed per batch request"
251
+ )
252
+
253
+ results = []
254
+
255
+ for file in files:
256
+ try:
257
+ # Process each file
258
+ audio_bytes = await file.read()
259
+ speech = preprocess_audio(audio_bytes)
260
+
261
+ inputs = processor(
262
+ speech,
263
+ sampling_rate=processor.sampling_rate,
264
+ return_tensors="pt",
265
+ padding=True
266
+ )
267
+
268
+ device = next(model.parameters()).device
269
+ inputs = {k: v.to(device) for k, v in inputs.items()}
270
+
271
+ with torch.no_grad():
272
+ outputs = model(**inputs)
273
+ logits = outputs.logits
274
+ probs = torch.nn.functional.softmax(logits, dim=-1)
275
+ predicted_class = torch.argmax(probs, dim=-1).item()
276
+ confidence = probs[0][predicted_class].item()
277
+
278
+ results.append({
279
+ "filename": file.filename,
280
+ "predicted_emotion": label_map[predicted_class],
281
+ "confidence": round(confidence, 4)
282
+ })
283
+
284
+ except Exception as e:
285
+ results.append({
286
+ "filename": file.filename,
287
+ "error": str(e)
288
+ })
289
+
290
+ return JSONResponse(content={
291
+ "success": True,
292
+ "results": results,
293
+ "total_files": len(files)
294
+ })
295
+
296
+
297
+ if __name__ == "__main__":
298
+ import uvicorn
299
+ uvicorn.run(app, host="0.0.0.0", port=7860)
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ fastapi==0.104.1
2
+ uvicorn[standard]==0.24.0
3
+ python-multipart==0.0.6
4
+ transformers==4.35.2
5
+ torch==2.1.0
6
+ torchaudio==2.1.0
7
+ librosa==0.10.1
8
+ numpy==1.24.3
9
+ soundfile==0.12.1
10
+ scipy==1.11.3
11
+ numba==0.58.1