| --- |
| tags: |
| - endpoints-compatible |
| --- |
| |
| # praise-ml-handler |
|
|
| Unified ASR + Diarization + Speaker Embedding + Speaker Matching handler for praise.global. |
|
|
| Forked from [sergeipetrov/asrdiarization-handler](https://huggingface.co/sergeipetrov/asrdiarization-handler). |
|
|
| ## Extensions over upstream |
|
|
| - **Speaker embedding extraction** — extracts per-speaker embeddings from pyannote's internal wespeaker model as a byproduct of diarization |
| - **Speaker matching** — matches diarized speakers against known voice profiles using cosine similarity |
| - **Confidence tiers** — HIGH (≥0.55), MEDIUM (≥0.35), LOW (<0.35) calibrated for pyannote embeddings |
|
|
| ## API |
|
|
| Standard Inference Endpoint `POST /` with `inputs` (base64 audio) and `parameters`: |
|
|
| ```json |
| { |
| "inputs": "<base64_audio>", |
| "parameters": { |
| "task": "transcribe", |
| "language": "en", |
| "batch_size": 24, |
| "chunk_length_s": 30, |
| "min_speakers": 2, |
| "max_speakers": 12, |
| "return_embeddings": true, |
| "known_speakers": [ |
| {"slug": "bob-ryan", "name": "Bob Ryan", "centroid_b64": "..."} |
| ] |
| } |
| } |
| ``` |
|
|
| ## Response |
|
|
| ```json |
| { |
| "text": "full transcript...", |
| "chunks": [...], |
| "speakers": [...], |
| "speaker_embeddings": { |
| "SPEAKER_00": {"embedding_b64": "...", "embedding_dim": 512, "total_seconds": 45.2, "num_segments": 12} |
| }, |
| "speaker_matches": { |
| "SPEAKER_00": {"matched_slug": "bob-ryan", "matched_name": "Bob Ryan", "confidence": "HIGH", "score": 0.72} |
| } |
| } |
| ``` |
|
|
| ## Deployment |
|
|
| Create via HF Inference Endpoints API with env vars: |
| - `ASR_MODEL=openai/whisper-large-v3` |
| - `DIARIZATION_MODEL=pyannote/speaker-diarization-3.1` |
| - `HF_TOKEN=<your_token>` |
| - `ASSISTANT_MODEL=distil-whisper/distil-large-v3` (optional, for speculative decoding) |
|
|