Spaces:

mistral-hackaton-2026
/

ethos

Running

App Files Files Community

ethos / web /README.md

Lior-0618

feat: replace frontend with redesigned Ethos Studio UI + update READMEs

c4c4f17 7 days ago

preview code

raw

history blame contribute delete

4.21 kB

	# Frontend (Next.js — port 3030)

	Ethos Studio UI. Upload audio or video files, view transcription results with per-segment emotion badges and facial emotion (FER) badges, and explore the waveform timeline in the Studio editor.

	---

	## Architecture

	```
	Browser (port 3030)
	→ Proxy layer (Node, port 3000) POST /api/speech-to-text, POST /api/transcribe-diarize, GET /health
	→ API layer (Python, port 8000) POST /transcribe, POST /transcribe-diarize, GET /health
	```

	- Frontend (`web/`): Upload page + Studio editor. Calls the Proxy layer.
	- Proxy layer (`proxy/`): Forwards browser requests to the API layer. See [proxy/README.md](../proxy/README.md) for API details.
	- API layer (`api/`): Local Voxtral inference + VAD segmentation + emotion + FER. See [api/README.md](../api/README.md) for API details.

	---

	## Startup

	### 1. API layer (Python, port 8000)

	Requires Python 3.11+ and ffmpeg.

	```bash
	cd api
	python -m venv .venv
	source .venv/bin/activate # Windows: .venv\Scripts\activate
	pip install -r requirements.txt
	uvicorn main:app --host 0.0.0.0 --port 8000 --reload
	```

	On first run the Voxtral model (~8 GB) is downloaded from HuggingFace.

	### 2. Proxy layer (Node, port 3000)

	```bash
	cd proxy
	npm install
	npm run dev
	```

	### 3. Frontend (Next.js, port 3030)

	```bash
	cd web
	npm install
	npm run dev
	```

	Open [http://localhost:3030](http://localhost:3030).

	- Home page: Click Transcribe files, drag-drop an audio or video file, then Upload. The file is sent to `/api/transcribe-diarize` and results open in the Studio.
	- Studio page (`/studio`): Three-column layout — transcript segments (speaker + emotion badges + FER badges for video) on the left, waveform in the center, audio player on the right.

	### 4. Quick check (API only)

	```bash
	curl -s http://localhost:3000/health
	curl -X POST http://localhost:3000/api/speech-to-text -F "audio=@audio.m4a"
	curl -X POST http://localhost:3000/api/transcribe-diarize -F "audio=@audio.m4a"
	curl -X POST http://localhost:3000/api/transcribe-diarize -F "audio=@video.mov"
	```

	---

	## API (Proxy layer)

	Clients should call the Proxy layer only. The API layer is internal.

	### POST /api/speech-to-text

	Simple transcription without diarization.

	\| \| \|
	\|--\|--\|
	\| Content-Type \| `multipart/form-data` \|
	\| Body \| `audio` — audio file (wav, mp3, flac, ogg, m4a, webm) \|
	\| Limits \| ≤ 100 MB; timeout 30 min \|

	Response (200)

	```json
	{
	"text": "transcribed text",
	"words": [],
	"languageCode": "en"
	}
	```

	---

	### POST /api/transcribe-diarize

	Full pipeline: transcription + VAD sentence segmentation + per-segment emotion analysis. For video inputs, also returns `face_emotion` per segment.

	\| \| \|
	\|--\|--\|
	\| Content-Type \| `multipart/form-data` \|
	\| Body \| `audio` — audio or video file (wav, mp3, flac, ogg, m4a, webm, mp4, mov, mkv) \|
	\| Limits \| ≤ 100 MB; timeout 60 min \|

	Response (200)

	```json
	{
	"segments": [
	{
	"id": 1,
	"speaker": "SPEAKER_00",
	"start": 0.0,
	"end": 4.2,
	"text": "Hello, how are you?",
	"emotion": "Happy",
	"valence": 0.7,
	"arousal": 0.6,
	"face_emotion": "Happy"
	}
	],
	"duration": 42.3,
	"text": "full transcript text",
	"filename": "recording.mov",
	"diarization_method": "vad",
	"has_video": true
	}
	```

	`face_emotion` appears only on video uploads when FER is enabled. `diarization_method` is always `"vad"`.

	---

	### GET /health

	```json
	{
	"ok": true,
	"server": "ser-server",
	"model": {
	"status": "ok",
	"model": "mistralai/Voxtral-Mini-3B-2507 + YongkangZOU/evoxtral-lora (local)",
	"model_loaded": true,
	"ffmpeg": true,
	"fer_enabled": true,
	"device": "cpu",
	"max_upload_mb": 100
	}
	}
	```

	---

	## Environment variables

	Create `web/.env.local`:

	\| Variable \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `NEXT_PUBLIC_API_URL` \| `http://localhost:3000` \| Proxy layer URL used by the browser \|

	Create `proxy/.env` or export:

	\| Variable \| Default \| Description \|
	\|----------\|---------\|-------------\|
	\| `PORT` \| `3000` \| Proxy layer port \|
	\| `MODEL_URL` \| `http://127.0.0.1:8000` \| API layer URL \|