docs: fix GitHub repo URL case (QuintinShaw/OpenASR -> QuintinShaw/openasr)

aaae214 verified 2 days ago

4.63 kB

license: mit
base_model: onnx-community/pyannote-segmentation-3.0
pipeline_tag: voice-activity-detection
library_name: openasr
tags:
  - speaker-diarization
  - openasr
  - oasr

pyannote Segmentation 3.0 · OpenASR

pyannote segmentation-3.0 — speaker-change and overlap aware speech segmentation for OpenASR diarization

Speaker-diarization support pack for the OpenASR runtime — pure-Rust inference, no Python at inference time.

✨ Highlights

✂️ Speaker-change aware segmentation — PyanNet (SincNet + BiLSTM) with a powerset head that detects up to 3 concurrent speakers, including overlapped speech
🤝 Quality upgrade for --diarize — installed alongside the CAM++ embedder pack, it replaces coarse VAD slices with fine speaker-turn boundaries
🔒 Diarization, not identification — anonymous session-relative labels; nothing leaves the machine
🎯 Bit-exact packaging — single raw-f32 build; the pure-Rust forward pass matches the upstream ONNX logits (max abs error ~7e-5)
🦀 Native in OpenASR — .oasr packs run with no Python at inference, engineered for peak performance on CPU & GPU

🚀 Quickstart

# 1. Install the OpenASR CLI  ·  https://openasr.org
# 2. Pull the pack
openasr pull pyannote-segmentation-3.0:f32

# 3. Diarize any transcription (works with every OpenASR ASR model)
openasr transcribe meeting.wav --model xasr-zh-en --diarize --format srt

📦 Pack

Quant	File (`.oasr`)	Size
f32	`pyannote-segmentation-3.0-f32.oasr`	6 MB

_{Single raw-f32 build: the pure-Rust forward pass consumes f32 directly and the
parity gates assert bit-exact outputs vs the upstream weights, so no integer
quantization is produced.}

🧠 About pyannote Segmentation 3.0

pyannote segmentation-3.0 is the local speech-segmentation model from the pyannote speaker diarization toolkit: a PyanNet (SincNet front-end + bidirectional LSTM) classifier over a 7-class powerset that labels every 10 s window with which of up to three speakers are active — including overlapped speech. OpenASR uses it as the optional segmentation stage of its model-agnostic diarization pipeline: when this pack is installed, --diarize splits speech at speaker changes instead of relying on coarse VAD slices, then the CAM++ embedder pack clusters the segments into anonymous speaker turns. Weights are extracted from the un-gated, MIT-licensed onnx-community ONNX mirror at a pinned revision and repackaged as a raw-f32 .oasr pack that runs in pure Rust — no Python at inference time.

⚙️ How this pack was made

Converted from onnx-community/pyannote-segmentation-3.0 with the OpenASR importer:

openasr model-pack import-pyannote-local <src>.safetensors <out>.oasr \
  --package-id pyannote-segmentation-3.0

The .oasr container is GGUF-backed; every tensor is stored as raw f32 so the pack round-trips bit-identically against the source weights.

⚖️ License

This pack inherits the upstream model's license: MIT (source). OpenASR packaging retains the upstream copyright; the only modification is format conversion.

🙏 Acknowledgements

This pack is a redistribution of pyannote segmentation-3.0, created by Hervé Bredin and the pyannote.audio project, via the un-gated ONNX mirror (onnx-community/pyannote-segmentation-3.0). All credit for the architecture, training, and weights belongs to the upstream authors; the license is inherited from and identical to the upstream model (MIT).

🔗 Links

🦀 OpenASR — https://github.com/QuintinShaw/openasr
🌐 Website — https://openasr.org
🤗 Upstream model — onnx-community/pyannote-segmentation-3.0