MrlolDev/voxtral-emotion-speech
Viewer β’ Updated β’ 1.14k β’ 57
Dataset: MrlolDev/voxtral-emotion-speech
Model: MrlolDev/voxtral-emotion-speech
| Model | UA% | WA% | F1% | WF1% | Data |
|---|---|---|---|---|---|
| Ours (Frozen + MLP) | 16.3 | 25.4 | 14.2 | 21.9 | 500 synthetic (11Labs) |
| Ours (LoRA + MLP) | - | - | - | - | end-to-end |
| SenseVoice-S | 70.5 | 65.7 | 67.9 | 67.8 | zero-shot |
| emotion2vec+ large | ~80 | ~80 | - | - | IEMOCAP |
Tested on 477/1004 IEMOCAP test samples (4-class: neutral, happy, sad, angry). Our model trained only on synthetic ElevenLabs clips - low score is expected. Models marked β were fine-tuned on IEMOCAP.
audio_tower() + mean poolingEnd-to-end LoRA finetuning on Voxtral encoder + emotion head:
python finetune_lora.py
Output: emotion_head_lora_best.pt + LoRA adapter in lora_adapter/
Compare frozen vs LoRA encoder on IEMOCAP:
python benchmark_lora.py
Installs dependencies using UV and logs into HuggingFace.
bash setup.sh
python extract_features.py
Output: features.pkl - list of records with keys:
features: numpy array (1280,)label: int (0-5)emotion: stringsplit: "train"/"validation"/"test"sensevoice_score: floatpython train.py
Outputs:
emotion_head_best.pt - Best model weightsconfusion_matrix.png - Test confusion matrixtraining_curve.png - Loss curvesBenchmarks the trained model:
Bench 1: Emotion F1 vs SenseVoice
Bench 2: Transcription WER
python benchmark.py
Output: benchmark_results.json
# 1. Setup
bash setup.sh
# 2. Extract features (~20 min)
python extract_features.py
# 3. Train (~10 min)
python train.py
# 4. Benchmark (~20 min)
python benchmark.py
# 5. Download results
tar -czf results.tar.gz emotion_head_best.pt features.pkl \
confusion_matrix.png training_curve.png benchmark_results.json
Then download results.tar.gz from RunPod Files tab.
Voxtral Encoder (frozen)
β
Mean Pooling (1280 dims)
β
EmotionHead MLP
- Linear(1280, 512) + BatchNorm + ReLU + Dropout(0.3)
- Linear(512, 256) + BatchNorm + ReLU + Dropout(0.3)
- Linear(256, 6)
Base model
mistralai/Ministral-3-3B-Base-2512