praise-ml-handler / README.md
ADGIT's picture
praise.global extensions: speaker embedding extraction + matching
3156fde
---
tags:
- endpoints-compatible
---
# praise-ml-handler
Unified ASR + Diarization + Speaker Embedding + Speaker Matching handler for praise.global.
Forked from [sergeipetrov/asrdiarization-handler](https://huggingface.co/sergeipetrov/asrdiarization-handler).
## Extensions over upstream
- **Speaker embedding extraction** — extracts per-speaker embeddings from pyannote's internal wespeaker model as a byproduct of diarization
- **Speaker matching** — matches diarized speakers against known voice profiles using cosine similarity
- **Confidence tiers** — HIGH (≥0.55), MEDIUM (≥0.35), LOW (<0.35) calibrated for pyannote embeddings
## API
Standard Inference Endpoint `POST /` with `inputs` (base64 audio) and `parameters`:
```json
{
"inputs": "<base64_audio>",
"parameters": {
"task": "transcribe",
"language": "en",
"batch_size": 24,
"chunk_length_s": 30,
"min_speakers": 2,
"max_speakers": 12,
"return_embeddings": true,
"known_speakers": [
{"slug": "bob-ryan", "name": "Bob Ryan", "centroid_b64": "..."}
]
}
}
```
## Response
```json
{
"text": "full transcript...",
"chunks": [...],
"speakers": [...],
"speaker_embeddings": {
"SPEAKER_00": {"embedding_b64": "...", "embedding_dim": 512, "total_seconds": 45.2, "num_segments": 12}
},
"speaker_matches": {
"SPEAKER_00": {"matched_slug": "bob-ryan", "matched_name": "Bob Ryan", "confidence": "HIGH", "score": 0.72}
}
}
```
## Deployment
Create via HF Inference Endpoints API with env vars:
- `ASR_MODEL=openai/whisper-large-v3`
- `DIARIZATION_MODEL=pyannote/speaker-diarization-3.1`
- `HF_TOKEN=<your_token>`
- `ASSISTANT_MODEL=distil-whisper/distil-large-v3` (optional, for speculative decoding)