--- tags: - audio - speaker-diarization - speaker-embedding - pyannote - funasr - meetingmind library_name: custom pipeline_tag: audio-classification --- # MeetingMind GPU Service GPU-accelerated speaker diarization and embedding extraction for the MeetingMind pipeline. Runs as an HF Inference Endpoint on a T4 GPU with scale-to-zero. ## API ### `GET /health` Returns service status and GPU availability. ```bash curl -H "Authorization: Bearer $HF_TOKEN" $ENDPOINT_URL/health ``` ```json {"status": "ok", "gpu_available": true} ``` ### `POST /diarize` Speaker diarization using pyannote v4. Accepts any audio format (FLAC, WAV, MP3, etc.). ```bash curl -X POST \ -H "Authorization: Bearer $HF_TOKEN" \ -F audio=@meeting.flac \ -F min_speakers=2 \ -F max_speakers=6 \ $ENDPOINT_URL/diarize ``` ```json { "segments": [ {"speaker": "SPEAKER_00", "start": 0.5, "end": 3.2, "duration": 2.7}, {"speaker": "SPEAKER_01", "start": 3.4, "end": 7.1, "duration": 3.7} ] } ``` ### `POST /embed` Speaker embedding extraction using FunASR CAM++. Returns L2-normalized 192-dim vectors for voiceprint matching. ```bash curl -X POST \ -H "Authorization: Bearer $HF_TOKEN" \ -F audio=@meeting.flac \ -F start_time=1.0 \ -F end_time=5.0 \ $ENDPOINT_URL/embed ``` ```json {"embedding": [0.012, -0.034, ...], "dim": 192} ``` ## Environment Variables | Variable | Default | Description | |---|---|---| | `HF_TOKEN` | (required) | Hugging Face token for pyannote model access | | `PYANNOTE_MIN_SPEAKERS` | `1` | Minimum speakers for diarization | | `PYANNOTE_MAX_SPEAKERS` | `10` | Maximum speakers for diarization | ## Architecture - **Base image**: `pytorch/pytorch:2.4.0-cuda12.4-cudnn9-runtime` - **Diarization**: pyannote/speaker-diarization-community-1 (~2GB VRAM) - **Embeddings**: FunASR CAM++ sv_zh-cn_16k-common (~200MB) - **Total VRAM**: ~3GB (fits T4 16GB with headroom) - **Scale-to-zero**: 15 min idle timeout (~$0.60/hr when active)