Spaces:
Running
Running
core components
- base model: Ethostral (fine-tuned Mistral).
- tracking and evaluation: weights and biases.
- platform: hugging face for model and adapter hosting.
architecture
- process: audio is streamed to the fine-tuned mistral voxtral endpoint for simultaneous automatic speech recognition and emotion classification.
- output format: transcription output uses interleaved text and emotional metadata tags.
- frontend: Next.js application utilizing shadcn UI (Maia style) and Phosphor icons for the interactive dashboard.
integration points
- weights and biases weave: used for tracing the recognition pipeline.
- hugging face hub: serves as the repository for fine-tuned weights and dataset storage.
- shadcn ui: component library with maia theme.
- phosphor icons: primary iconography set.
performance metrics
- word error rate for transcription quality.
- f1 score for emotion detection accuracy.
benchmarking and evals
- IEMOCAP: Evaluation of categorical and dimensional (Valence/Arousal/Dominance) accuracy.
- RAVDESS: Benchmarking of prosodic feature mapping and speech rate accuracy.
- SUSAS: Evaluation of stress detection reliability under varied acoustic conditions.
- MDPE: Assessment of deception-related emotional leakage detection.
- Weights & Biases Weave: Used for tracking eval traces and scoring pipeline performance.