Spaces:
Running
Running
File size: 1,424 Bytes
ebdda02 33349d7 ebdda02 8319599 5ec36dc ebdda02 8319599 ebdda02 5ec36dc ebdda02 8319599 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ## core components
- **base model:** Ethostral (fine-tuned Mistral).
- **tracking and evaluation:** weights and biases.
- **platform:** hugging face for model and adapter hosting.
## architecture
1. **process:** audio is streamed to the fine-tuned mistral voxtral endpoint for simultaneous automatic speech recognition and emotion classification.
2. **output format:** transcription output uses interleaved text and emotional metadata tags.
3. **frontend:** Next.js application utilizing shadcn UI (Maia style) and Phosphor icons for the interactive dashboard.
## integration points
- **weights and biases weave:** used for tracing the recognition pipeline.
- **hugging face hub:** serves as the repository for fine-tuned weights and dataset storage.
- **shadcn ui:** component library with maia theme.
- **phosphor icons:** primary iconography set.
## performance metrics
- word error rate for transcription quality.
- f1 score for emotion detection accuracy.
## benchmarking and evals
- **IEMOCAP:** Evaluation of categorical and dimensional (Valence/Arousal/Dominance) accuracy.
- **RAVDESS:** Benchmarking of prosodic feature mapping and speech rate accuracy.
- **SUSAS:** Evaluation of stress detection reliability under varied acoustic conditions.
- **MDPE:** Assessment of deception-related emotional leakage detection.
- **Weights & Biases Weave:** Used for tracking eval traces and scoring pipeline performance. |