abedir commited on
Commit
ff2b4bd
·
verified ·
1 Parent(s): 23098cf

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +27 -242
README.md CHANGED
@@ -1,252 +1,37 @@
1
- # 🎭 Emotion Recognition API
2
-
3
- A FastAPI-based emotion recognition system using HuBERT (Hidden-Unit BERT) for audio emotion classification.
4
-
5
- ## 📋 Features
6
-
7
- - **Real-time Emotion Detection**: Analyze audio files and detect emotions
8
- - **Multiple Format Support**: WAV, MP3, FLAC, OGG, M4A
9
- - **Batch Processing**: Process multiple audio files at once
10
- - **RESTful API**: Easy integration with any application
11
- - **High Accuracy**: Fine-tuned HuBERT model for emotion classification
12
-
13
- ## 🎯 Supported Emotions
14
-
15
- - Angry/Disgust
16
- - Happy/Surprised
17
- - Neutral/Calm
18
- - Sad/Fearful
19
-
20
- ## 🚀 Quick Start
21
-
22
- ### Using the API
23
-
24
- 1. **Single Prediction**
25
- ```bash
26
- curl -X POST "http://your-space-url/predict" \
27
- -F "file=@your_audio.wav"
28
- ```
29
-
30
- 2. **Batch Prediction**
31
- ```bash
32
- curl -X POST "http://your-space-url/predict_batch" \
33
- -F "files=@audio1.wav" \
34
- -F "files=@audio2.wav"
35
- ```
36
-
37
- 3. **Get Available Labels**
38
- ```bash
39
- curl "http://your-space-url/labels"
40
- ```
41
 
42
- 4. **Health Check**
43
- ```bash
44
- curl "http://your-space-url/health"
45
- ```
46
 
47
- ## 📖 API Documentation
48
 
49
- Once deployed, visit `/docs` for interactive API documentation (Swagger UI).
 
 
 
 
50
 
51
- ### Endpoints
52
 
53
- #### `POST /predict`
54
- Upload a single audio file for emotion prediction.
55
 
56
- **Request:**
57
- - Form data with `file` parameter (audio file)
58
 
59
- **Response:**
60
  ```json
61
  {
62
- "success": true,
63
- "predicted_emotion": "Happy/Surprised",
64
- "confidence": 0.8542,
65
- "all_probabilities": {
66
- "Angry/Disgust": 0.0234,
67
- "Happy/Surprised": 0.8542,
68
- "Neutral/Calm": 0.0891,
69
- "Sad/Fearful": 0.0333
70
- },
71
- "filename": "sample.wav"
72
  }
73
- ```
74
-
75
- #### `POST /predict_batch`
76
- Upload multiple audio files (max 10) for batch prediction.
77
-
78
- **Request:**
79
- - Form data with multiple `files` parameters
80
-
81
- **Response:**
82
- ```json
83
- {
84
- "success": true,
85
- "results": [
86
- {
87
- "filename": "audio1.wav",
88
- "predicted_emotion": "Happy/Surprised",
89
- "confidence": 0.8542
90
- },
91
- {
92
- "filename": "audio2.wav",
93
- "predicted_emotion": "Sad/Fearful",
94
- "confidence": 0.7231
95
- }
96
- ],
97
- "total_files": 2
98
- }
99
- ```
100
-
101
- #### `GET /labels`
102
- Get all available emotion labels.
103
-
104
- #### `GET /health`
105
- Check API health status.
106
-
107
- ## 🔧 Setup Instructions
108
-
109
- ### Prerequisites
110
- - Python 3.10+
111
- - Your trained HuBERT model files
112
-
113
- ### Local Development
114
-
115
- 1. **Clone the repository**
116
- ```bash
117
- git clone <your-repo>
118
- cd <repo-name>
119
- ```
120
-
121
- 2. **Install dependencies**
122
- ```bash
123
- pip install -r requirements.txt
124
- ```
125
-
126
- 3. **Add your model**
127
- Place your trained model files in the `model/` directory:
128
- ```
129
- model/
130
- ├── config.json
131
- ├── preprocessor_config.json
132
- ├── pytorch_model.bin
133
- └── (other model files)
134
- ```
135
-
136
- 4. **Run the server**
137
- ```bash
138
- uvicorn app:app --host 0.0.0.0 --port 7860
139
- ```
140
-
141
- 5. **Test the API**
142
- Visit `http://localhost:7860/docs` for interactive documentation.
143
-
144
- ### Deploying to Hugging Face Spaces
145
-
146
- 1. **Create a new Space**
147
- - Go to [Hugging Face Spaces](https://huggingface.co/spaces)
148
- - Click "Create new Space"
149
- - Choose "Docker" as the SDK
150
- - Name your Space
151
-
152
- 2. **Upload files**
153
- Upload the following files to your Space:
154
- - `app.py`
155
- - `requirements.txt`
156
- - `Dockerfile`
157
- - `README.md`
158
- - Your `model/` directory with all model files
159
-
160
- 3. **Configure Space**
161
- - The Space will automatically build using the Dockerfile
162
- - Once built, your API will be available at `https://your-username-space-name.hf.space`
163
-
164
- ## 📦 Model Files Required
165
-
166
- Make sure your `model/` directory contains:
167
- - `config.json` - Model configuration
168
- - `preprocessor_config.json` - Feature extractor configuration
169
- - `pytorch_model.bin` - Model weights
170
- - Any other files saved by `save_pretrained()`
171
-
172
- ## 🐍 Python Client Example
173
-
174
- ```python
175
- import requests
176
-
177
- # Predict emotion from audio file
178
- url = "http://your-space-url/predict"
179
- files = {"file": open("audio.wav", "rb")}
180
- response = requests.post(url, files=files)
181
- result = response.json()
182
-
183
- print(f"Emotion: {result['predicted_emotion']}")
184
- print(f"Confidence: {result['confidence']}")
185
- print(f"All probabilities: {result['all_probabilities']}")
186
- ```
187
-
188
- ## 🔍 JavaScript/TypeScript Example
189
-
190
- ```javascript
191
- const formData = new FormData();
192
- formData.append('file', audioFile);
193
-
194
- const response = await fetch('http://your-space-url/predict', {
195
- method: 'POST',
196
- body: formData
197
- });
198
-
199
- const result = await response.json();
200
- console.log('Emotion:', result.predicted_emotion);
201
- console.log('Confidence:', result.confidence);
202
- ```
203
-
204
- ## ⚙️ Configuration
205
-
206
- You can modify the following in `app.py`:
207
-
208
- - **EMOTION_LABELS**: Update emotion label mappings
209
- - **max_duration**: Change audio duration limit (default: 3 seconds)
210
- - **Batch size limit**: Modify maximum files per batch request
211
-
212
- ## 📊 Performance
213
-
214
- - **Inference Time**: ~100-300ms per audio file (CPU)
215
- - **Inference Time**: ~50-100ms per audio file (GPU)
216
- - **Supported Audio Length**: Up to 3 seconds (configurable)
217
- - **Concurrent Requests**: Supports multiple simultaneous requests
218
-
219
- ## 🛠️ Troubleshooting
220
-
221
- ### Common Issues
222
-
223
- 1. **Model not loading**
224
- - Ensure all model files are in the `model/` directory
225
- - Check that file paths in `app.py` match your structure
226
-
227
- 2. **Audio processing errors**
228
- - Verify audio file format is supported
229
- - Check that librosa and soundfile are installed correctly
230
-
231
- 3. **Out of memory**
232
- - Reduce batch size
233
- - Use smaller audio files
234
- - Enable CPU-only mode if GPU memory is limited
235
-
236
- ## 📝 License
237
-
238
- This project is licensed under the MIT License.
239
-
240
- ## 🙏 Acknowledgments
241
-
242
- - HuBERT model by Facebook AI Research
243
- - Transformers library by Hugging Face
244
- - FastAPI framework
245
-
246
- ## 📧 Contact
247
-
248
- For questions or issues, please open an issue on GitHub or contact [your-email].
249
-
250
- ---
251
-
252
- **Note**: Make sure to replace `your-space-url`, `your-username`, and other placeholders with your actual information.
 
1
+ ---
2
+ title: HuBERT Emotion Recognition
3
+ emoji: 🎧
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ app_port: 7860
8
+ ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9
 
10
+ ## 🎧 HuBERT Emotion Recognition API
 
 
 
11
 
12
+ This Space provides an emotion recognition API for speech audio using **HuBERT**.
13
 
14
+ ### 🎯 Supported emotions
15
+ - Neutral / Calm
16
+ - Happy / Surprised
17
+ - Angry / Disgust
18
+ - Sad / Fearful
19
 
20
+ ### 🚀 API Endpoint
21
 
22
+ **POST** `/predict`
 
23
 
24
+ Upload a `.wav` file.
 
25
 
26
+ ### 📦 Response
27
  ```json
28
  {
29
+ "emotion": "Happy/Surprised",
30
+ "confidence": 0.87,
31
+ "probabilities": {
32
+ "Happy/Surprised": 0.87,
33
+ "Neutral/Calm": 0.05,
34
+ "Angry/Disgust": 0.04,
35
+ "Sad/Fearful": 0.04
36
+ }
 
 
37
  }