Spaces:
Running
Running
| ## core components | |
| - **base model:** Ethostral (fine-tuned Mistral). | |
| - **tracking and evaluation:** weights and biases. | |
| - **platform:** hugging face for model and adapter hosting. | |
| ## architecture | |
| 1. **process:** audio is streamed to the fine-tuned mistral voxtral endpoint for simultaneous automatic speech recognition and emotion classification. | |
| 2. **output format:** transcription output uses interleaved text and emotional metadata tags. | |
| 3. **frontend:** Next.js application utilizing shadcn UI (Maia style) and Phosphor icons for the interactive dashboard. | |
| ## integration points | |
| - **weights and biases weave:** used for tracing the recognition pipeline. | |
| - **hugging face hub:** serves as the repository for fine-tuned weights and dataset storage. | |
| - **shadcn ui:** component library with maia theme. | |
| - **phosphor icons:** primary iconography set. | |
| ## performance metrics | |
| - word error rate for transcription quality. | |
| - f1 score for emotion detection accuracy. | |
| ## benchmarking and evals | |
| - **IEMOCAP:** Evaluation of categorical and dimensional (Valence/Arousal/Dominance) accuracy. | |
| - **RAVDESS:** Benchmarking of prosodic feature mapping and speech rate accuracy. | |
| - **SUSAS:** Evaluation of stress detection reliability under varied acoustic conditions. | |
| - **MDPE:** Assessment of deception-related emotional leakage detection. | |
| - **Weights & Biases Weave:** Used for tracking eval traces and scoring pipeline performance. |