praise.global extensions: speaker embedding extraction + matching

3156fde 2 months ago

1.74 kB

	---
	tags:
	- endpoints-compatible
	---

	# praise-ml-handler

	Unified ASR + Diarization + Speaker Embedding + Speaker Matching handler for praise.global.

	Forked from [sergeipetrov/asrdiarization-handler](https://huggingface.co/sergeipetrov/asrdiarization-handler).

	## Extensions over upstream

	- Speaker embedding extraction — extracts per-speaker embeddings from pyannote's internal wespeaker model as a byproduct of diarization
	- Speaker matching — matches diarized speakers against known voice profiles using cosine similarity
	- Confidence tiers — HIGH (≥0.55), MEDIUM (≥0.35), LOW (<0.35) calibrated for pyannote embeddings

	## API

	Standard Inference Endpoint `POST /` with `inputs` (base64 audio) and `parameters`:

	```json
	{
	"inputs": "<base64_audio>",
	"parameters": {
	"task": "transcribe",
	"language": "en",
	"batch_size": 24,
	"chunk_length_s": 30,
	"min_speakers": 2,
	"max_speakers": 12,
	"return_embeddings": true,
	"known_speakers": [
	{"slug": "bob-ryan", "name": "Bob Ryan", "centroid_b64": "..."}
	]
	}
	}
	```

	## Response

	```json
	{
	"text": "full transcript...",
	"chunks": [...],
	"speakers": [...],
	"speaker_embeddings": {
	"SPEAKER_00": {"embedding_b64": "...", "embedding_dim": 512, "total_seconds": 45.2, "num_segments": 12}
	},
	"speaker_matches": {
	"SPEAKER_00": {"matched_slug": "bob-ryan", "matched_name": "Bob Ryan", "confidence": "HIGH", "score": 0.72}
	}
	}
	```

	## Deployment

	Create via HF Inference Endpoints API with env vars:
	- `ASR_MODEL=openai/whisper-large-v3`
	- `DIARIZATION_MODEL=pyannote/speaker-diarization-3.1`
	- `HF_TOKEN=<your_token>`
	- `ASSISTANT_MODEL=distil-whisper/distil-large-v3` (optional, for speculative decoding)