praise-ml-handler

Unified ASR + Diarization + Speaker Embedding + Speaker Matching handler for praise.global.

Forked from sergeipetrov/asrdiarization-handler.

Extensions over upstream

  • Speaker embedding extraction — extracts per-speaker embeddings from pyannote's internal wespeaker model as a byproduct of diarization
  • Speaker matching — matches diarized speakers against known voice profiles using cosine similarity
  • Confidence tiers — HIGH (≥0.55), MEDIUM (≥0.35), LOW (<0.35) calibrated for pyannote embeddings

API

Standard Inference Endpoint POST / with inputs (base64 audio) and parameters:

{
  "inputs": "<base64_audio>",
  "parameters": {
    "task": "transcribe",
    "language": "en",
    "batch_size": 24,
    "chunk_length_s": 30,
    "min_speakers": 2,
    "max_speakers": 12,
    "return_embeddings": true,
    "known_speakers": [
      {"slug": "bob-ryan", "name": "Bob Ryan", "centroid_b64": "..."}
    ]
  }
}

Response

{
  "text": "full transcript...",
  "chunks": [...],
  "speakers": [...],
  "speaker_embeddings": {
    "SPEAKER_00": {"embedding_b64": "...", "embedding_dim": 512, "total_seconds": 45.2, "num_segments": 12}
  },
  "speaker_matches": {
    "SPEAKER_00": {"matched_slug": "bob-ryan", "matched_name": "Bob Ryan", "confidence": "HIGH", "score": 0.72}
  }
}

Deployment

Create via HF Inference Endpoints API with env vars:

  • ASR_MODEL=openai/whisper-large-v3
  • DIARIZATION_MODEL=pyannote/speaker-diarization-3.1
  • HF_TOKEN=<your_token>
  • ASSISTANT_MODEL=distil-whisper/distil-large-v3 (optional, for speculative decoding)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support