Spaces:

mistral-hackaton-2026
/

ethos

Running

ethos / docs /context /technical_spec.md

refactor: restructure repo into api/ proxy/ web/ training/ docs/

a265585 6 days ago

1.42 kB

	## core components
	- base model: Ethostral (fine-tuned Mistral).
	- tracking and evaluation: weights and biases.
	- platform: hugging face for model and adapter hosting.

	## architecture
	1. process: audio is streamed to the fine-tuned mistral voxtral endpoint for simultaneous automatic speech recognition and emotion classification.
	2. output format: transcription output uses interleaved text and emotional metadata tags.
	3. frontend: Next.js application utilizing shadcn UI (Maia style) and Phosphor icons for the interactive dashboard.

	## integration points
	- weights and biases weave: used for tracing the recognition pipeline.
	- hugging face hub: serves as the repository for fine-tuned weights and dataset storage.
	- shadcn ui: component library with maia theme.
	- phosphor icons: primary iconography set.

	## performance metrics
	- word error rate for transcription quality.
	- f1 score for emotion detection accuracy.

	## benchmarking and evals
	- IEMOCAP: Evaluation of categorical and dimensional (Valence/Arousal/Dominance) accuracy.
	- RAVDESS: Benchmarking of prosodic feature mapping and speech rate accuracy.
	- SUSAS: Evaluation of stress detection reliability under varied acoustic conditions.
	- MDPE: Assessment of deception-related emotional leakage detection.
	- Weights & Biases Weave: Used for tracking eval traces and scoring pipeline performance.

	## core components
	- base model: Ethostral (fine-tuned Mistral).
	- tracking and evaluation: weights and biases.
	- platform: hugging face for model and adapter hosting.

	## architecture
	1. process: audio is streamed to the fine-tuned mistral voxtral endpoint for simultaneous automatic speech recognition and emotion classification.
	2. output format: transcription output uses interleaved text and emotional metadata tags.
	3. frontend: Next.js application utilizing shadcn UI (Maia style) and Phosphor icons for the interactive dashboard.

	## integration points
	- weights and biases weave: used for tracing the recognition pipeline.
	- hugging face hub: serves as the repository for fine-tuned weights and dataset storage.
	- shadcn ui: component library with maia theme.
	- phosphor icons: primary iconography set.

	## performance metrics
	- word error rate for transcription quality.
	- f1 score for emotion detection accuracy.

	## benchmarking and evals
	- IEMOCAP: Evaluation of categorical and dimensional (Valence/Arousal/Dominance) accuracy.
	- RAVDESS: Benchmarking of prosodic feature mapping and speech rate accuracy.
	- SUSAS: Evaluation of stress detection reliability under varied acoustic conditions.
	- MDPE: Assessment of deception-related emotional leakage detection.
	- Weights & Biases Weave: Used for tracking eval traces and scoring pipeline performance.