--- tags: - endpoints-compatible --- # praise-ml-handler Unified ASR + Diarization + Speaker Embedding + Speaker Matching handler for praise.global. Forked from [sergeipetrov/asrdiarization-handler](https://huggingface.co/sergeipetrov/asrdiarization-handler). ## Extensions over upstream - **Speaker embedding extraction** — extracts per-speaker embeddings from pyannote's internal wespeaker model as a byproduct of diarization - **Speaker matching** — matches diarized speakers against known voice profiles using cosine similarity - **Confidence tiers** — HIGH (≥0.55), MEDIUM (≥0.35), LOW (<0.35) calibrated for pyannote embeddings ## API Standard Inference Endpoint `POST /` with `inputs` (base64 audio) and `parameters`: ```json { "inputs": "", "parameters": { "task": "transcribe", "language": "en", "batch_size": 24, "chunk_length_s": 30, "min_speakers": 2, "max_speakers": 12, "return_embeddings": true, "known_speakers": [ {"slug": "bob-ryan", "name": "Bob Ryan", "centroid_b64": "..."} ] } } ``` ## Response ```json { "text": "full transcript...", "chunks": [...], "speakers": [...], "speaker_embeddings": { "SPEAKER_00": {"embedding_b64": "...", "embedding_dim": 512, "total_seconds": 45.2, "num_segments": 12} }, "speaker_matches": { "SPEAKER_00": {"matched_slug": "bob-ryan", "matched_name": "Bob Ryan", "confidence": "HIGH", "score": 0.72} } } ``` ## Deployment Create via HF Inference Endpoints API with env vars: - `ASR_MODEL=openai/whisper-large-v3` - `DIARIZATION_MODEL=pyannote/speaker-diarization-3.1` - `HF_TOKEN=` - `ASSISTANT_MODEL=distil-whisper/distil-large-v3` (optional, for speculative decoding)