File size: 4,627 Bytes
672c5b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aaae214
672c5b6
 
 
aaae214
672c5b6
 
 
 
 
 
 
 
 
d060110
672c5b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d060110
672c5b6
 
 
 
 
 
 
 
 
d060110
672c5b6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aaae214
672c5b6
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
---
license: mit
base_model: onnx-community/pyannote-segmentation-3.0
pipeline_tag: voice-activity-detection
library_name: openasr
tags:
  - speaker-diarization
  - openasr
  - oasr
---

<div align="center">

# pyannote Segmentation 3.0 Β· OpenASR

**pyannote segmentation-3.0 β€” speaker-change and overlap aware speech segmentation for OpenASR diarization**

[![License](https://img.shields.io/badge/license-MIT-2563eb.svg)](https://huggingface.co/onnx-community/pyannote-segmentation-3.0)
[![Format](https://img.shields.io/badge/format-.oasr-7c3aed.svg)](https://github.com/QuintinShaw/openasr)
[![Runtime](https://img.shields.io/badge/runtime-OpenASR-111827.svg)](https://openasr.org)
[![Base model](https://img.shields.io/badge/base-pyannote--segmentation--3.0-f59e0b.svg)](https://huggingface.co/onnx-community/pyannote-segmentation-3.0)

Speaker-diarization support pack for the **[OpenASR](https://github.com/QuintinShaw/openasr)** runtime β€”
pure-Rust inference, **no Python at inference time**.

</div>

---

## ✨ Highlights

- βœ‚οΈ **Speaker-change aware segmentation** β€” PyanNet (SincNet + BiLSTM) with a powerset head that detects up to 3 concurrent speakers, including overlapped speech
- 🀝 **Quality upgrade for `--diarize`** β€” installed alongside the WeSpeaker embedder pack, it replaces coarse VAD slices with fine speaker-turn boundaries
- πŸ”’ **Diarization, not identification** β€” anonymous session-relative labels; nothing leaves the machine
- 🎯 **Bit-exact packaging** β€” single raw-f32 build; the pure-Rust forward pass matches the upstream ONNX logits (max abs error ~7e-5)
- πŸ¦€ **Native in OpenASR** β€” `.oasr` packs run with no Python at inference, engineered for peak performance on CPU & GPU

## πŸš€ Quickstart

```bash
# 1. Install the OpenASR CLI  Β·  https://openasr.org
# 2. Pull the pack
openasr pull pyannote-segmentation-3.0:f32

# 3. Diarize any transcription (works with every OpenASR ASR model)
openasr transcribe meeting.wav --model xasr-zh-en --diarize --format srt
```

## πŸ“¦ Pack

| Quant | File (`.oasr`) | Size |
|:------|:---------------|-----:|
| f32 | `pyannote-segmentation-3.0-f32.oasr` | 6 MB |

<sub>Single raw-**f32** build: the pure-Rust forward pass consumes f32 directly and the
parity gates assert bit-exact outputs vs the upstream weights, so no integer
quantization is produced.</sub>

## 🧠 About pyannote Segmentation 3.0

pyannote **segmentation-3.0** is the local speech-segmentation model from the pyannote speaker
diarization toolkit: a PyanNet (SincNet front-end + bidirectional LSTM) classifier over a 7-class
powerset that labels every 10 s window with which of up to three speakers are active β€” including
overlapped speech. OpenASR uses it as the optional segmentation stage of its model-agnostic
diarization pipeline: when this pack is installed, `--diarize` splits speech at speaker changes
instead of relying on coarse VAD slices, then the WeSpeaker embedder pack clusters the segments into
anonymous speaker turns. Weights are extracted from the un-gated, MIT-licensed
**onnx-community** ONNX mirror at a pinned revision and repackaged as a raw-f32 `.oasr` pack that
runs in pure Rust β€” no Python at inference time.

## βš™οΈ How this pack was made

Converted from [onnx-community/pyannote-segmentation-3.0](https://huggingface.co/onnx-community/pyannote-segmentation-3.0) with the OpenASR importer:

```bash
openasr model-pack import pyannote <src>.safetensors <out>.oasr \
  --package-id pyannote-segmentation-3.0
```

The `.oasr` container is GGUF-backed; every tensor is stored as raw f32 so the
pack round-trips bit-identically against the source weights.

## βš–οΈ License

This pack **inherits the upstream model's license: MIT**
([source](https://huggingface.co/onnx-community/pyannote-segmentation-3.0)). OpenASR packaging retains the upstream copyright;
the only modification is format conversion.

## πŸ™ Acknowledgements

This pack is a redistribution of **pyannote segmentation-3.0**, created by HervΓ© Bredin and the
**pyannote.audio** project, via the un-gated ONNX mirror
([onnx-community/pyannote-segmentation-3.0](https://huggingface.co/onnx-community/pyannote-segmentation-3.0)).
All credit for the architecture, training, and weights belongs to the upstream authors; the license
is inherited from and identical to the upstream model (MIT).

## πŸ”— Links

- πŸ¦€ **OpenASR** β€” <https://github.com/QuintinShaw/openasr>
- 🌐 **Website** β€” <https://openasr.org>
- πŸ€— **Upstream model** β€” [onnx-community/pyannote-segmentation-3.0](https://huggingface.co/onnx-community/pyannote-segmentation-3.0)