Spaces:
Running
Running
File size: 3,430 Bytes
c4c4f17 e0fe7d5 c4c4f17 e0fe7d5 c4c4f17 e0fe7d5 c4c4f17 e0fe7d5 c4c4f17 e0fe7d5 c4c4f17 e0fe7d5 fdef69c e0fe7d5 fdef69c e0fe7d5 c4c4f17 e0fe7d5 fdef69c e0fe7d5 fdef69c c4c4f17 fdef69c c4c4f17 fdef69c c4c4f17 fdef69c c4c4f17 fdef69c c4c4f17 fdef69c c4c4f17 fdef69c c4c4f17 e0fe7d5 fdef69c e0fe7d5 fdef69c e0fe7d5 c4c4f17 e0fe7d5 fdef69c c4c4f17 e0fe7d5 c4c4f17 fdef69c e0fe7d5 c4c4f17 fdef69c e0fe7d5 fdef69c c4c4f17 e0fe7d5 c4c4f17 fdef69c c4c4f17 e0fe7d5 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 | # Proxy Layer (Node/Express — port 3000)
API gateway. Accepts multipart file uploads from the browser, forwards them to the **API layer** (Python FastAPI on port 8000), and returns JSON responses.
- **Port**: `3000` (override with `PORT`)
- **API layer URL**: `http://127.0.0.1:8000` (override with `MODEL_URL`)
---
## Startup
```bash
cd proxy
npm install
npm run dev # dev with --watch
# or
npm start
```
Requires **Node.js 22+**.
---
## API
### POST /api/speech-to-text
Simple transcription. Forwarded to API layer `POST /transcribe`. Timeout: **30 min** (CPU inference is slow).
| | |
|--|--|
| **Content-Type** | `multipart/form-data` |
| **Body** | `audio` — audio file (wav, mp3, flac, ogg, m4a, webm) |
| **Limits** | ≤ 100 MB |
**Response (200)**
```json
{
"text": "transcribed text",
"words": [],
"languageCode": "en"
}
```
**Errors**
| Status | Body |
|--------|------|
| 400 | `{"error": "Upload an audio file (form field: audio)"}` |
| 502 | API layer error or unreachable |
| 504 | `{"error": "Request timeout (>30 min); try shorter audio"}` |
---
### POST /api/transcribe-diarize
Full pipeline: transcription + VAD sentence segmentation + emotion analysis. For video inputs, also returns `face_emotion` per segment. Forwarded to API layer `POST /transcribe-diarize`. Timeout: **60 min**.
| | |
|--|--|
| **Content-Type** | `multipart/form-data` |
| **Body** | `audio` — audio or video file (wav, mp3, flac, ogg, m4a, webm, mp4, mov, mkv) |
| **Limits** | ≤ 100 MB |
**Response (200)**
```json
{
"segments": [
{
"id": 1,
"speaker": "SPEAKER_00",
"start": 0.0,
"end": 4.2,
"text": "Hello, how are you?",
"emotion": "Happy",
"valence": 0.7,
"arousal": 0.6,
"face_emotion": "Happy"
}
],
"duration": 42.3,
"text": "full transcript",
"filename": "recording.mov",
"diarization_method": "vad",
"has_video": true
}
```
`face_emotion` is present only when a video file is uploaded and FER is enabled. `has_video` indicates whether facial emotion recognition ran.
**Errors**
| Status | Body |
|--------|------|
| 400 | `{"error": "Upload an audio file (form field: audio)"}` |
| 502 | API layer error or unreachable |
| 504 | `{"error": "Request timeout (>60 min); try shorter audio"}` |
---
### GET /health
Proxies `GET {MODEL_URL}/health` and wraps it.
**Response (200)**
```json
{
"ok": true,
"server": "ser-server",
"model": {
"status": "ok",
"model": "mistralai/Voxtral-Mini-3B-2507 + YongkangZOU/evoxtral-lora (local)",
"model_loaded": true,
"ffmpeg": true,
"fer_enabled": true,
"device": "cpu",
"max_upload_mb": 100
}
}
```
**Response (502)** — when API layer is unreachable:
```json
{"ok": false, "error": "Cannot reach Model layer; start model/voxtral-server first", "url": "http://127.0.0.1:8000"}
```
---
### GET /api/debug-inference
Proxies `GET {MODEL_URL}/debug-inference` — smoke-tests the local Voxtral model with a short silence clip.
---
## Usage examples
```bash
# Health
curl -s http://localhost:3000/health
# Transcribe (audio)
curl -X POST http://localhost:3000/api/speech-to-text -F "audio=@./recording.m4a"
# Transcribe + segment + emotion (audio or video)
curl -X POST http://localhost:3000/api/transcribe-diarize -F "audio=@./recording.m4a"
curl -X POST http://localhost:3000/api/transcribe-diarize -F "audio=@./video.mov"
```
|