File size: 1,424 Bytes
ebdda02
33349d7
ebdda02
 
 
 
8319599
 
5ec36dc
ebdda02
 
8319599
ebdda02
5ec36dc
 
ebdda02
 
 
 
 
 
8319599
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## core components
- **base model:** Ethostral (fine-tuned Mistral).
- **tracking and evaluation:** weights and biases.
- **platform:** hugging face for model and adapter hosting.

## architecture
1. **process:** audio is streamed to the fine-tuned mistral voxtral endpoint for simultaneous automatic speech recognition and emotion classification.
2. **output format:** transcription output uses interleaved text and emotional metadata tags.
3. **frontend:** Next.js application utilizing shadcn UI (Maia style) and Phosphor icons for the interactive dashboard.

## integration points
- **weights and biases weave:** used for tracing the recognition pipeline.
- **hugging face hub:** serves as the repository for fine-tuned weights and dataset storage.
- **shadcn ui:** component library with maia theme.
- **phosphor icons:** primary iconography set.

## performance metrics
- word error rate for transcription quality.
- f1 score for emotion detection accuracy.

## benchmarking and evals
- **IEMOCAP:** Evaluation of categorical and dimensional (Valence/Arousal/Dominance) accuracy.
- **RAVDESS:** Benchmarking of prosodic feature mapping and speech rate accuracy.
- **SUSAS:** Evaluation of stress detection reliability under varied acoustic conditions.
- **MDPE:** Assessment of deception-related emotional leakage detection.
- **Weights & Biases Weave:** Used for tracking eval traces and scoring pipeline performance.