ethos / docs /context /technical_spec.md
Lior-0618's picture
refactor: restructure repo into api/ proxy/ web/ training/ docs/
a265585

core components

  • base model: Ethostral (fine-tuned Mistral).
  • tracking and evaluation: weights and biases.
  • platform: hugging face for model and adapter hosting.

architecture

  1. process: audio is streamed to the fine-tuned mistral voxtral endpoint for simultaneous automatic speech recognition and emotion classification.
  2. output format: transcription output uses interleaved text and emotional metadata tags.
  3. frontend: Next.js application utilizing shadcn UI (Maia style) and Phosphor icons for the interactive dashboard.

integration points

  • weights and biases weave: used for tracing the recognition pipeline.
  • hugging face hub: serves as the repository for fine-tuned weights and dataset storage.
  • shadcn ui: component library with maia theme.
  • phosphor icons: primary iconography set.

performance metrics

  • word error rate for transcription quality.
  • f1 score for emotion detection accuracy.

benchmarking and evals

  • IEMOCAP: Evaluation of categorical and dimensional (Valence/Arousal/Dominance) accuracy.
  • RAVDESS: Benchmarking of prosodic feature mapping and speech rate accuracy.
  • SUSAS: Evaluation of stress detection reliability under varied acoustic conditions.
  • MDPE: Assessment of deception-related emotional leakage detection.
  • Weights & Biases Weave: Used for tracking eval traces and scoring pipeline performance.