File size: 1,808 Bytes
f66e5fd | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ---
license: mit
tags:
- pytorch
- audio
- emotion-recognition
- audio-classification
- customer-service
---
# Mantis
## Model Description
Mantis is an audio-based emotion recognition model designed for customer service intelligence. It classifies emotional states from speech audio using a HuBERT + CNN hybrid architecture, enabling real-time sentiment monitoring in call center environments.
## Model Architecture
- **Architecture**: HuBERT (feature extractor) + CNN (classifier head)
- **Framework**: PyTorch
- **Task**: Audio Emotion Classification
- **Input**: Raw audio waveforms / mel spectrograms
- **Output**: Emotion class (e.g., neutral, happy, angry, sad, frustrated)
## Training Details
- **Dataset**: Trained on emotion speech datasets (e.g., RAVDESS, IEMOCAP, or proprietary customer service audio)
- **Approach**: HuBERT pre-trained representations fed into a custom CNN classifier
- **Fine-tuning**: End-to-end fine-tuning for customer service emotion categories
## Performance
Evaluated on held-out emotion speech samples with strong accuracy across key emotion classes relevant to customer service.
## Files
| File | Description |
|------|-------------|
| `emotion_model.pth` | Final trained HuBERT-CNN emotion recognition model |
## Usage
```python
import torch
from huggingface_hub import hf_hub_download
# Download model
model_path = hf_hub_download(repo_id='devanshty/Mantis', filename='emotion_model.pth')
# Load model (adjust to your model class)
model = torch.load(model_path, map_location='cpu')
model.eval()
# Run inference on audio features
# (preprocess audio to match training pipeline)
```
## Download & Use
```python
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(repo_id='devanshty/Mantis', filename='emotion_model.pth')
```
|