ethos / proxy /README.md
Lior-0618's picture
feat: replace frontend with redesigned Ethos Studio UI + update READMEs
c4c4f17
# Proxy Layer (Node/Express β€” port 3000)
API gateway. Accepts multipart file uploads from the browser, forwards them to the **API layer** (Python FastAPI on port 8000), and returns JSON responses.
- **Port**: `3000` (override with `PORT`)
- **API layer URL**: `http://127.0.0.1:8000` (override with `MODEL_URL`)
---
## Startup
```bash
cd proxy
npm install
npm run dev # dev with --watch
# or
npm start
```
Requires **Node.js 22+**.
---
## API
### POST /api/speech-to-text
Simple transcription. Forwarded to API layer `POST /transcribe`. Timeout: **30 min** (CPU inference is slow).
| | |
|--|--|
| **Content-Type** | `multipart/form-data` |
| **Body** | `audio` β€” audio file (wav, mp3, flac, ogg, m4a, webm) |
| **Limits** | ≀ 100 MB |
**Response (200)**
```json
{
"text": "transcribed text",
"words": [],
"languageCode": "en"
}
```
**Errors**
| Status | Body |
|--------|------|
| 400 | `{"error": "Upload an audio file (form field: audio)"}` |
| 502 | API layer error or unreachable |
| 504 | `{"error": "Request timeout (>30 min); try shorter audio"}` |
---
### POST /api/transcribe-diarize
Full pipeline: transcription + VAD sentence segmentation + emotion analysis. For video inputs, also returns `face_emotion` per segment. Forwarded to API layer `POST /transcribe-diarize`. Timeout: **60 min**.
| | |
|--|--|
| **Content-Type** | `multipart/form-data` |
| **Body** | `audio` β€” audio or video file (wav, mp3, flac, ogg, m4a, webm, mp4, mov, mkv) |
| **Limits** | ≀ 100 MB |
**Response (200)**
```json
{
"segments": [
{
"id": 1,
"speaker": "SPEAKER_00",
"start": 0.0,
"end": 4.2,
"text": "Hello, how are you?",
"emotion": "Happy",
"valence": 0.7,
"arousal": 0.6,
"face_emotion": "Happy"
}
],
"duration": 42.3,
"text": "full transcript",
"filename": "recording.mov",
"diarization_method": "vad",
"has_video": true
}
```
`face_emotion` is present only when a video file is uploaded and FER is enabled. `has_video` indicates whether facial emotion recognition ran.
**Errors**
| Status | Body |
|--------|------|
| 400 | `{"error": "Upload an audio file (form field: audio)"}` |
| 502 | API layer error or unreachable |
| 504 | `{"error": "Request timeout (>60 min); try shorter audio"}` |
---
### GET /health
Proxies `GET {MODEL_URL}/health` and wraps it.
**Response (200)**
```json
{
"ok": true,
"server": "ser-server",
"model": {
"status": "ok",
"model": "mistralai/Voxtral-Mini-3B-2507 + YongkangZOU/evoxtral-lora (local)",
"model_loaded": true,
"ffmpeg": true,
"fer_enabled": true,
"device": "cpu",
"max_upload_mb": 100
}
}
```
**Response (502)** β€” when API layer is unreachable:
```json
{"ok": false, "error": "Cannot reach Model layer; start model/voxtral-server first", "url": "http://127.0.0.1:8000"}
```
---
### GET /api/debug-inference
Proxies `GET {MODEL_URL}/debug-inference` β€” smoke-tests the local Voxtral model with a short silence clip.
---
## Usage examples
```bash
# Health
curl -s http://localhost:3000/health
# Transcribe (audio)
curl -X POST http://localhost:3000/api/speech-to-text -F "audio=@./recording.m4a"
# Transcribe + segment + emotion (audio or video)
curl -X POST http://localhost:3000/api/transcribe-diarize -F "audio=@./recording.m4a"
curl -X POST http://localhost:3000/api/transcribe-diarize -F "audio=@./video.mov"
```