File size: 1,891 Bytes
6216b68 efbb752 6216b68 efbb752 6216b68 bc4519b 6216b68 bc4519b efbb752 6216b68 efbb752 6216b68 efbb752 6216b68 efbb752 6216b68 bc4519b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 | ---
tags:
- audio
- speech-to-text
- voxtral
- meetingmind
library_name: custom
pipeline_tag: automatic-speech-recognition
---
# MeetingMind Voxtral Transcription Endpoint
GPU-accelerated speech-to-text for the MeetingMind pipeline using Voxtral Realtime 4B. Runs as an HF Inference Endpoint on a T4 GPU with scale-to-zero.
**Model weights**: [`mistral-hackaton-2026/voxtral_model`](https://huggingface.co/mistral-hackaton-2026/voxtral_model) — Voxtral Realtime 4B (BF16 safetensors, loaded from `/repository/voxtral-model/`)
## API
### `GET /health`
Returns service status and GPU availability.
```bash
curl -H "Authorization: Bearer $HF_TOKEN" $ENDPOINT_URL/health
```
```json
{"status": "ok", "gpu_available": true}
```
### `POST /transcribe`
Speech-to-text using Voxtral Realtime 4B. Returns full transcription.
```bash
curl -X POST \
-H "Authorization: Bearer $HF_TOKEN" \
-F audio=@speech.wav \
$ENDPOINT_URL/transcribe
```
```json
{"text": "Hello, this is a test of the voxtral speech to text system."}
```
### `POST /transcribe/stream`
Streaming speech-to-text via SSE. Tokens are emitted as they are generated.
```bash
curl -X POST \
-H "Authorization: Bearer $HF_TOKEN" \
-F audio=@speech.wav \
$ENDPOINT_URL/transcribe/stream
```
Events: `token` (partial), `done` (final text), `error`.
## Environment Variables
| Variable | Default | Description |
|---|---|---|
| `VOXTRAL_MODEL_DIR` | `/repository/voxtral-model` | Path to Voxtral model weights |
## Architecture
- **Base image**: `pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime`
- **Transcription**: Voxtral Realtime 4B via direct safetensors loading (~8GB VRAM)
- **Scale-to-zero**: 15 min idle timeout (~$0.60/hr when active)
- **Diarization & embeddings**: Served separately by the GPU service on machine "tanti"
|