File size: 4,627 Bytes
672c5b6 aaae214 672c5b6 aaae214 672c5b6 d060110 672c5b6 d060110 672c5b6 d060110 672c5b6 aaae214 672c5b6 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | ---
license: mit
base_model: onnx-community/pyannote-segmentation-3.0
pipeline_tag: voice-activity-detection
library_name: openasr
tags:
- speaker-diarization
- openasr
- oasr
---
<div align="center">
# pyannote Segmentation 3.0 Β· OpenASR
**pyannote segmentation-3.0 β speaker-change and overlap aware speech segmentation for OpenASR diarization**
[](https://huggingface.co/onnx-community/pyannote-segmentation-3.0)
[](https://github.com/QuintinShaw/openasr)
[](https://openasr.org)
[](https://huggingface.co/onnx-community/pyannote-segmentation-3.0)
Speaker-diarization support pack for the **[OpenASR](https://github.com/QuintinShaw/openasr)** runtime β
pure-Rust inference, **no Python at inference time**.
</div>
---
## β¨ Highlights
- βοΈ **Speaker-change aware segmentation** β PyanNet (SincNet + BiLSTM) with a powerset head that detects up to 3 concurrent speakers, including overlapped speech
- π€ **Quality upgrade for `--diarize`** β installed alongside the WeSpeaker embedder pack, it replaces coarse VAD slices with fine speaker-turn boundaries
- π **Diarization, not identification** β anonymous session-relative labels; nothing leaves the machine
- π― **Bit-exact packaging** β single raw-f32 build; the pure-Rust forward pass matches the upstream ONNX logits (max abs error ~7e-5)
- π¦ **Native in OpenASR** β `.oasr` packs run with no Python at inference, engineered for peak performance on CPU & GPU
## π Quickstart
```bash
# 1. Install the OpenASR CLI Β· https://openasr.org
# 2. Pull the pack
openasr pull pyannote-segmentation-3.0:f32
# 3. Diarize any transcription (works with every OpenASR ASR model)
openasr transcribe meeting.wav --model xasr-zh-en --diarize --format srt
```
## π¦ Pack
| Quant | File (`.oasr`) | Size |
|:------|:---------------|-----:|
| f32 | `pyannote-segmentation-3.0-f32.oasr` | 6 MB |
<sub>Single raw-**f32** build: the pure-Rust forward pass consumes f32 directly and the
parity gates assert bit-exact outputs vs the upstream weights, so no integer
quantization is produced.</sub>
## π§ About pyannote Segmentation 3.0
pyannote **segmentation-3.0** is the local speech-segmentation model from the pyannote speaker
diarization toolkit: a PyanNet (SincNet front-end + bidirectional LSTM) classifier over a 7-class
powerset that labels every 10 s window with which of up to three speakers are active β including
overlapped speech. OpenASR uses it as the optional segmentation stage of its model-agnostic
diarization pipeline: when this pack is installed, `--diarize` splits speech at speaker changes
instead of relying on coarse VAD slices, then the WeSpeaker embedder pack clusters the segments into
anonymous speaker turns. Weights are extracted from the un-gated, MIT-licensed
**onnx-community** ONNX mirror at a pinned revision and repackaged as a raw-f32 `.oasr` pack that
runs in pure Rust β no Python at inference time.
## βοΈ How this pack was made
Converted from [onnx-community/pyannote-segmentation-3.0](https://huggingface.co/onnx-community/pyannote-segmentation-3.0) with the OpenASR importer:
```bash
openasr model-pack import pyannote <src>.safetensors <out>.oasr \
--package-id pyannote-segmentation-3.0
```
The `.oasr` container is GGUF-backed; every tensor is stored as raw f32 so the
pack round-trips bit-identically against the source weights.
## βοΈ License
This pack **inherits the upstream model's license: MIT**
([source](https://huggingface.co/onnx-community/pyannote-segmentation-3.0)). OpenASR packaging retains the upstream copyright;
the only modification is format conversion.
## π Acknowledgements
This pack is a redistribution of **pyannote segmentation-3.0**, created by HervΓ© Bredin and the
**pyannote.audio** project, via the un-gated ONNX mirror
([onnx-community/pyannote-segmentation-3.0](https://huggingface.co/onnx-community/pyannote-segmentation-3.0)).
All credit for the architecture, training, and weights belongs to the upstream authors; the license
is inherited from and identical to the upstream model (MIT).
## π Links
- π¦ **OpenASR** β <https://github.com/QuintinShaw/openasr>
- π **Website** β <https://openasr.org>
- π€ **Upstream model** β [onnx-community/pyannote-segmentation-3.0](https://huggingface.co/onnx-community/pyannote-segmentation-3.0) |