al8n commited on
Commit
8ecadc1
·
verified ·
1 Parent(s): ae53fc3

initial bundle: segmentation-3.0 + wespeaker_resnet34_lm (3 forms) + PLDA weights, with attribution

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ wespeaker_resnet34_lm.onnx.data filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ This bundle redistributes model artifacts under three different upstream
2
+ licenses. The full text of each license is included in the
3
+ `LICENSE.MIT`, `LICENSE.APACHE-2.0`, and `LICENSE.CC-BY-4.0` files
4
+ alongside this one. Each artifact retains its upstream license; using
5
+ the bundle obligates the user to comply with all three.
6
+
7
+ segmentation-3.0.onnx — MIT (Copyright © 2023 CNRS,
8
+ Hervé Bredin / pyannote.audio)
9
+ wespeaker_resnet34_lm.onnx (+ .data) — Apache-2.0 (WeSpeaker / wenet-e2e)
10
+ wespeaker_resnet34_lm_packed.onnx — Apache-2.0 (derivative of above:
11
+ same weights repacked into a
12
+ single file)
13
+ wespeaker_resnet34_lm.pt — Apache-2.0 (TorchScript export)
14
+ plda/* — CC-BY-4.0 (BUT Speech@FIT,
15
+ redistributed via pyannote/
16
+ speaker-diarization-community-1)
17
+
18
+ CC-BY-4.0 attribution for plda/* (required by upstream):
19
+ PLDA model trained by BUT Speech@FIT (https://speech.fit.vut.cz/).
20
+ Integration of VBx in pyannote.audio by Jiangyu Han and Petr Pálka.
21
+
22
+ See README.md for upstream sources and snapshot revisions.
LICENSE.APACHE-2.0 ADDED
@@ -0,0 +1,201 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Apache License
2
+ Version 2.0, January 2004
3
+ http://www.apache.org/licenses/
4
+
5
+ TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6
+
7
+ 1. Definitions.
8
+
9
+ "License" shall mean the terms and conditions for use, reproduction,
10
+ and distribution as defined by Sections 1 through 9 of this document.
11
+
12
+ "Licensor" shall mean the copyright owner or entity authorized by
13
+ the copyright owner that is granting the License.
14
+
15
+ "Legal Entity" shall mean the union of the acting entity and all
16
+ other entities that control, are controlled by, or are under common
17
+ control with that entity. For the purposes of this definition,
18
+ "control" means (i) the power, direct or indirect, to cause the
19
+ direction or management of such entity, whether by contract or
20
+ otherwise, or (ii) ownership of fifty percent (50%) or more of the
21
+ outstanding shares, or (iii) beneficial ownership of such entity.
22
+
23
+ "You" (or "Your") shall mean an individual or Legal Entity
24
+ exercising permissions granted by this License.
25
+
26
+ "Source" form shall mean the preferred form for making modifications,
27
+ including but not limited to software source code, documentation
28
+ source, and configuration files.
29
+
30
+ "Object" form shall mean any form resulting from mechanical
31
+ transformation or translation of a Source form, including but
32
+ not limited to compiled object code, generated documentation,
33
+ and conversions to other media types.
34
+
35
+ "Work" shall mean the work of authorship, whether in Source or
36
+ Object form, made available under the License, as indicated by a
37
+ copyright notice that is included in or attached to the work
38
+ (an example is provided in the Appendix below).
39
+
40
+ "Derivative Works" shall mean any work, whether in Source or Object
41
+ form, that is based on (or derived from) the Work and for which the
42
+ editorial revisions, annotations, elaborations, or other modifications
43
+ represent, as a whole, an original work of authorship. For the purposes
44
+ of this License, Derivative Works shall not include works that remain
45
+ separable from, or merely link (or bind by name) to the interfaces of,
46
+ the Work and Derivative Works thereof.
47
+
48
+ "Contribution" shall mean any work of authorship, including
49
+ the original version of the Work and any modifications or additions
50
+ to that Work or Derivative Works thereof, that is intentionally
51
+ submitted to Licensor for inclusion in the Work by the copyright owner
52
+ or by an individual or Legal Entity authorized to submit on behalf of
53
+ the copyright owner. For the purposes of this definition, "submitted"
54
+ means any form of electronic, verbal, or written communication sent
55
+ to the Licensor or its representatives, including but not limited to
56
+ communication on electronic mailing lists, source code control systems,
57
+ and issue tracking systems that are managed by, or on behalf of, the
58
+ Licensor for the purpose of discussing and improving the Work, but
59
+ excluding communication that is conspicuously marked or otherwise
60
+ designated in writing by the copyright owner as "Not a Contribution."
61
+
62
+ "Contributor" shall mean Licensor and any individual or Legal Entity
63
+ on behalf of whom a Contribution has been received by Licensor and
64
+ subsequently incorporated within the Work.
65
+
66
+ 2. Grant of Copyright License. Subject to the terms and conditions of
67
+ this License, each Contributor hereby grants to You a perpetual,
68
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69
+ copyright license to reproduce, prepare Derivative Works of,
70
+ publicly display, publicly perform, sublicense, and distribute the
71
+ Work and such Derivative Works in Source or Object form.
72
+
73
+ 3. Grant of Patent License. Subject to the terms and conditions of
74
+ this License, each Contributor hereby grants to You a perpetual,
75
+ worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76
+ (except as stated in this section) patent license to make, have made,
77
+ use, offer to sell, sell, import, and otherwise transfer the Work,
78
+ where such license applies only to those patent claims licensable
79
+ by such Contributor that are necessarily infringed by their
80
+ Contribution(s) alone or by combination of their Contribution(s)
81
+ with the Work to which such Contribution(s) was submitted. If You
82
+ institute patent litigation against any entity (including a
83
+ cross-claim or counterclaim in a lawsuit) alleging that the Work
84
+ or a Contribution incorporated within the Work constitutes direct
85
+ or contributory patent infringement, then any patent licenses
86
+ granted to You under this License for that Work shall terminate
87
+ as of the date such litigation is filed.
88
+
89
+ 4. Redistribution. You may reproduce and distribute copies of the
90
+ Work or Derivative Works thereof in any medium, with or without
91
+ modifications, and in Source or Object form, provided that You
92
+ meet the following conditions:
93
+
94
+ (a) You must give any other recipients of the Work or
95
+ Derivative Works a copy of this License; and
96
+
97
+ (b) You must cause any modified files to carry prominent notices
98
+ stating that You changed the files; and
99
+
100
+ (c) You must retain, in the Source form of any Derivative Works
101
+ that You distribute, all copyright, patent, trademark, and
102
+ attribution notices from the Source form of the Work,
103
+ excluding those notices that do not pertain to any part of
104
+ the Derivative Works; and
105
+
106
+ (d) If the Work includes a "NOTICE" text file as part of its
107
+ distribution, then any Derivative Works that You distribute must
108
+ include a readable copy of the attribution notices contained
109
+ within such NOTICE file, excluding those notices that do not
110
+ pertain to any part of the Derivative Works, in at least one
111
+ of the following places: within a NOTICE text file distributed
112
+ as part of the Derivative Works; within the Source form or
113
+ documentation, if provided along with the Derivative Works; or,
114
+ within a display generated by the Derivative Works, if and
115
+ wherever such third-party notices normally appear. The contents
116
+ of the NOTICE file are for informational purposes only and
117
+ do not modify the License. You may add Your own attribution
118
+ notices within Derivative Works that You distribute, alongside
119
+ or as an addendum to the NOTICE text from the Work, provided
120
+ that such additional attribution notices cannot be construed
121
+ as modifying the License.
122
+
123
+ You may add Your own copyright statement to Your modifications and
124
+ may provide additional or different license terms and conditions
125
+ for use, reproduction, or distribution of Your modifications, or
126
+ for any such Derivative Works as a whole, provided Your use,
127
+ reproduction, and distribution of the Work otherwise complies with
128
+ the conditions stated in this License.
129
+
130
+ 5. Submission of Contributions. Unless You explicitly state otherwise,
131
+ any Contribution intentionally submitted for inclusion in the Work
132
+ by You to the Licensor shall be under the terms and conditions of
133
+ this License, without any additional terms or conditions.
134
+ Notwithstanding the above, nothing herein shall supersede or modify
135
+ the terms of any separate license agreement you may have executed
136
+ with Licensor regarding such Contributions.
137
+
138
+ 6. Trademarks. This License does not grant permission to use the trade
139
+ names, trademarks, service marks, or product names of the Licensor,
140
+ except as required for reasonable and customary use in describing the
141
+ origin of the Work and reproducing the content of the NOTICE file.
142
+
143
+ 7. Disclaimer of Warranty. Unless required by applicable law or
144
+ agreed to in writing, Licensor provides the Work (and each
145
+ Contributor provides its Contributions) on an "AS IS" BASIS,
146
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147
+ implied, including, without limitation, any warranties or conditions
148
+ of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149
+ PARTICULAR PURPOSE. You are solely responsible for determining the
150
+ appropriateness of using or redistributing the Work and assume any
151
+ risks associated with Your exercise of permissions under this License.
152
+
153
+ 8. Limitation of Liability. In no event and under no legal theory,
154
+ whether in tort (including negligence), contract, or otherwise,
155
+ unless required by applicable law (such as deliberate and grossly
156
+ negligent acts) or agreed to in writing, shall any Contributor be
157
+ liable to You for damages, including any direct, indirect, special,
158
+ incidental, or consequential damages of any character arising as a
159
+ result of this License or out of the use or inability to use the
160
+ Work (including but not limited to damages for loss of goodwill,
161
+ work stoppage, computer failure or malfunction, or any and all
162
+ other commercial damages or losses), even if such Contributor
163
+ has been advised of the possibility of such damages.
164
+
165
+ 9. Accepting Warranty or Additional Liability. While redistributing
166
+ the Work or Derivative Works thereof, You may choose to offer,
167
+ and charge a fee for, acceptance of support, warranty, indemnity,
168
+ or other liability obligations and/or rights consistent with this
169
+ License. However, in accepting such obligations, You may act only
170
+ on Your own behalf and on Your sole responsibility, not on behalf
171
+ of any other Contributor, and only if You agree to indemnify,
172
+ defend, and hold each Contributor harmless for any liability
173
+ incurred by, or claims asserted against, such Contributor by reason
174
+ of your accepting any such warranty or additional liability.
175
+
176
+ END OF TERMS AND CONDITIONS
177
+
178
+ APPENDIX: How to apply the Apache License to your work.
179
+
180
+ To apply the Apache License to your work, attach the following
181
+ boilerplate notice, with the fields enclosed by brackets "[]"
182
+ replaced with your own identifying information. (Don't include
183
+ the brackets!) The text should be enclosed in the appropriate
184
+ comment syntax for the file format. We also recommend that a
185
+ file or class name and description of purpose be included on the
186
+ same "printed page" as the copyright notice for easier
187
+ identification within third-party archives.
188
+
189
+ Copyright [yyyy] [name of copyright owner]
190
+
191
+ Licensed under the Apache License, Version 2.0 (the "License");
192
+ you may not use this file except in compliance with the License.
193
+ You may obtain a copy of the License at
194
+
195
+ http://www.apache.org/licenses/LICENSE-2.0
196
+
197
+ Unless required by applicable law or agreed to in writing, software
198
+ distributed under the License is distributed on an "AS IS" BASIS,
199
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200
+ See the License for the specific language governing permissions and
201
+ limitations under the License.
LICENSE.CC-BY-4.0 ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Creative Commons Attribution 4.0 International (CC-BY-4.0)
2
+
3
+ This CC-BY-4.0 license applies to:
4
+ plda/eigenvectors_desc.bin
5
+ plda/lda.bin
6
+ plda/mean1.bin
7
+ plda/mean2.bin
8
+ plda/mu.bin
9
+ plda/phi_desc.bin
10
+ plda/psi.bin
11
+ plda/tr.bin
12
+ plda/plda.npz
13
+ plda/xvec_transform.npz
14
+
15
+ Required attribution (per upstream `plda/README.md` in the
16
+ `pyannote/speaker-diarization-community-1` HuggingFace snapshot at
17
+ revision 3533c8cf8e369892e6b79ff1bf80f7b0286a54ee):
18
+
19
+ PLDA model trained by BUT Speech@FIT (https://speech.fit.vut.cz/).
20
+ Integration of VBx in pyannote.audio by Jiangyu Han and Petr Pálka.
21
+
22
+ The full text of the CC-BY-4.0 license is available at:
23
+ https://creativecommons.org/licenses/by/4.0/legalcode
24
+
25
+ Summary of permissions granted (not a substitute for the legal text):
26
+
27
+ You are free to:
28
+ Share — copy and redistribute the material in any medium or format
29
+ Adapt — remix, transform, and build upon the material
30
+ for any purpose, even commercially.
31
+
32
+ The licensor cannot revoke these freedoms as long as you follow the
33
+ license terms.
34
+
35
+ Under the following terms:
36
+ Attribution — You must give appropriate credit, provide a link to the
37
+ license, and indicate if changes were made. You may do so in any
38
+ reasonable manner, but not in any way that suggests the licensor
39
+ endorses you or your use.
40
+ No additional restrictions — You may not apply legal terms or
41
+ technological measures that legally restrict others from doing
42
+ anything the license permits.
43
+
44
+ Notices:
45
+ You do not have to comply with the license for elements of the
46
+ material in the public domain or where your use is permitted by an
47
+ applicable exception or limitation.
48
+
49
+ No warranties are given. The license may not give you all of the
50
+ permissions necessary for your intended use. For example, other
51
+ rights such as publicity, privacy, or moral rights may limit how you
52
+ use the material.
LICENSE.MIT ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2023 CNRS
4
+
5
+ This MIT license applies to:
6
+ segmentation-3.0.onnx
7
+
8
+ Author: Hervé Bredin (CNRS / IRIT) — pyannote.audio author and lead trainer.
9
+ Source: https://huggingface.co/pyannote/segmentation-3.0
10
+
11
+ Permission is hereby granted, free of charge, to any person obtaining a copy
12
+ of this software and associated documentation files (the "Software"), to deal
13
+ in the Software without restriction, including without limitation the rights
14
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
15
+ copies of the Software, and to permit persons to whom the Software is
16
+ furnished to do so, subject to the following conditions:
17
+
18
+ The above copyright notice and this permission notice shall be included in all
19
+ copies or substantial portions of the Software.
20
+
21
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
22
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
23
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
24
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
25
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
26
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
27
+ SOFTWARE.
README.md ADDED
@@ -0,0 +1,210 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: other
3
+ license_name: mixed-mit-cc-by-4-apache-2
4
+ license_link: LICENSE
5
+ language:
6
+ - en
7
+ - multilingual
8
+ library_name: onnx
9
+ tags:
10
+ - speaker-diarization
11
+ - diarization
12
+ - pyannote
13
+ - speaker-embedding
14
+ - wespeaker
15
+ - segmentation
16
+ pipeline_tag: voice-activity-detection
17
+ ---
18
+
19
+ # dia-models — pyannote community-1 model bundle for the `dia` Rust crate
20
+
21
+ A single-repo distribution of every model artifact the
22
+ [`dia`](https://github.com/al8n/diarization) Rust crate needs to run
23
+ end-to-end speaker diarization with **pyannote-community-1** parity:
24
+
25
+ - The **segmentation-3.0** powerset speaker network (16 kHz audio →
26
+ per-frame speaker activations).
27
+ - The **WeSpeaker ResNet34-LM** speaker-embedding network, in three
28
+ forms (external-data ONNX, single-file ONNX, TorchScript).
29
+ - The **PLDA** whitening + LDA weights from the
30
+ [`pyannote/speaker-diarization-community-1`](https://huggingface.co/pyannote/speaker-diarization-community-1)
31
+ pipeline, in both `.npz` (build-time) and raw little-endian f64
32
+ `.bin` (runtime) form.
33
+
34
+ `dia` already embeds the segmentation model and the PLDA weights into
35
+ the compiled binary via `include_bytes!`; the **WeSpeaker** ONNX is
36
+ the only artifact callers must download separately. This repo lets
37
+ callers grab any individual model — or the whole bundle — without
38
+ spelunking through the upstream pyannote / WeSpeaker repos.
39
+
40
+ > **Attribution: this is a redistribution, not new model training.**
41
+ > All weights come from upstream pyannote / WeSpeaker / BUT Speech@FIT.
42
+ > The licenses below MUST be preserved by anyone redistributing.
43
+
44
+ ## Files
45
+
46
+ | File | Size | Format | License |
47
+ |---|---:|---|---|
48
+ | `segmentation-3.0.onnx` | 5.99 MiB | ONNX (single file) | MIT |
49
+ | `wespeaker_resnet34_lm.onnx` | 256 KiB | ONNX header (external data) | Apache-2.0 |
50
+ | `wespeaker_resnet34_lm.onnx.data` | 25.3 MiB | external-data weights | Apache-2.0 |
51
+ | `wespeaker_resnet34_lm_packed.onnx` | 25.5 MiB | ONNX (single file, repacked) | Apache-2.0 |
52
+ | `wespeaker_resnet34_lm.pt` | 25.6 MiB | TorchScript | Apache-2.0 |
53
+ | `plda/eigenvectors_desc.bin` | 128 KiB | f64 (128×128 row-major) | CC-BY-4.0 |
54
+ | `plda/lda.bin` | 256 KiB | f64 (256×128 row-major) | CC-BY-4.0 |
55
+ | `plda/mean1.bin` | 2 KiB | f64 (256,) | CC-BY-4.0 |
56
+ | `plda/mean2.bin` | 1 KiB | f64 (128,) | CC-BY-4.0 |
57
+ | `plda/mu.bin` | 1 KiB | f64 (128,) | CC-BY-4.0 |
58
+ | `plda/phi_desc.bin` | 1 KiB | f64 (128,) | CC-BY-4.0 |
59
+ | `plda/psi.bin` | 1 KiB | f64 (128,) | CC-BY-4.0 |
60
+ | `plda/tr.bin` | 128 KiB | f64 (128×128 row-major) | CC-BY-4.0 |
61
+ | `plda/plda.npz` | 131 KiB | numpy (`mu`, `tr`, `psi`) | CC-BY-4.0 |
62
+ | `plda/xvec_transform.npz` | 131 KiB | numpy (`mean1`, `mean2`, `lda`) | CC-BY-4.0 |
63
+
64
+ ## Which file do I want?
65
+
66
+ ### Segmentation
67
+ Use `segmentation-3.0.onnx`. It feeds `dia::segment::SegmentModel`
68
+ (or any pyannote-segmentation-compatible runtime). Single file, no
69
+ external data, works on every ORT execution provider.
70
+
71
+ ### Embedding (WeSpeaker)
72
+ Three forms, same weights, pick by use case:
73
+
74
+ - **`wespeaker_resnet34_lm.onnx` + `wespeaker_resnet34_lm.onnx.data`**
75
+ — the default ONNX layout. Loads on CPU / TensorRT / CUDA / OpenVINO
76
+ / DirectML. The `.onnx` and `.onnx.data` files MUST sit next to
77
+ each other on disk; ORT resolves the external pointer by relative
78
+ path.
79
+ - **`wespeaker_resnet34_lm_packed.onnx`** — same model with all
80
+ weights inlined into one file. Use this if you want a single-file
81
+ artifact, or if the runtime is **CoreML** (Apple Silicon — Apple's
82
+ graph optimizer chokes on external initializers and reports
83
+ `model_path must not be empty`; the packed form sidesteps it).
84
+ Otherwise functionally identical.
85
+ - **`wespeaker_resnet34_lm.pt`** — TorchScript export for the
86
+ `tch` backend. Bit-exact to upstream PyTorch on hard cases (heavy-
87
+ overlap fixtures where the ONNX→ORT path can drift by O(1) per
88
+ element). Pulls in libtorch (~600 MB shared library).
89
+
90
+ ### PLDA
91
+ The eight `.bin` files are the runtime data — raw little-endian f64
92
+ blobs that `dia::plda` embeds via `include_bytes!`. The two `.npz`
93
+ files are the build-time sources (`xvec_transform.npz` exposes
94
+ `mean1` / `mean2` / `lda`; `plda.npz` exposes `mu` / `tr` /
95
+ `psi`); they are mirrored from the upstream pyannote-community-1
96
+ snapshot for traceability and so the `.bin` extraction can be
97
+ re-run via `scripts/extract-plda-blobs.sh` in the dia repo.
98
+
99
+ `eigenvectors_desc.bin` and `phi_desc.bin` are scipy-derived
100
+ eigenvectors of the PLDA generalized eigenproblem `(B, W)` — pinned
101
+ to avoid LAPACK eigenvector-sign indeterminism (which produced a
102
+ 38% DER divergence on three-speaker fixtures when nalgebra and
103
+ scipy disagreed on 67 of 128 column signs). See
104
+ [`models/plda/SOURCE.md`](https://github.com/al8n/diarization/blob/main/models/plda/SOURCE.md)
105
+ in the dia repo for the regeneration procedure.
106
+
107
+ ## Provenance
108
+
109
+ ### segmentation-3.0.onnx
110
+ - **Upstream:** [`pyannote/segmentation-3.0`](https://huggingface.co/pyannote/segmentation-3.0)
111
+ - **Original layout:** `pytorch_model.onnx` in the upstream HF repo.
112
+ - **License:** MIT — Copyright (c) 2023 CNRS
113
+ - **Author:** Hervé Bredin (CNRS / IRIT), pyannote.audio author and
114
+ lead trainer.
115
+ - **SHA-256:** `057ee564753071c0b09b5b611648b50ac188d50846bff5f01e9f7bbf1591ea25`
116
+
117
+ ### wespeaker_resnet34_lm.onnx (+ .data) / .pt / _packed.onnx
118
+ - **Upstream model architecture:** WeSpeaker ResNet34 with
119
+ large-margin (LM) angular fine-tuning, trained on VoxCeleb-2.
120
+ - **Upstream sources:**
121
+ - [WeSpeaker project](https://github.com/wenet-e2e/wespeaker) (Apache-2.0)
122
+ - [`onnx-community/wespeaker_resnet34_lm`](https://huggingface.co/onnx-community/wespeaker_resnet34_lm)
123
+ for the ONNX export.
124
+ - **License:** Apache-2.0.
125
+ - **`_packed.onnx` derivative:** produced by loading
126
+ `wespeaker_resnet34_lm.onnx` + `.onnx.data` via the `onnx` Python
127
+ library (`onnx.load(path, load_external_data=True)`) and re-saving
128
+ with `save_as_external_data=False`. Same weights, no external file.
129
+
130
+ ### plda/
131
+ - **Upstream:** [`pyannote/speaker-diarization-community-1`](https://huggingface.co/pyannote/speaker-diarization-community-1)
132
+ - **License:** CC-BY-4.0
133
+ - **Snapshot revision:** `3533c8cf8e369892e6b79ff1bf80f7b0286a54ee`
134
+ - **Original layout in the upstream HF repo:**
135
+ `plda/xvec_transform.npz` and `plda/plda.npz`.
136
+ - **Attribution (per upstream `plda/README.md`):**
137
+ PLDA model trained by [BUT Speech@FIT](https://speech.fit.vut.cz/);
138
+ integration of VBx in pyannote.audio by Jiangyu Han and Petr Pálka.
139
+
140
+ ## Usage
141
+
142
+ ### From `dia` (Rust)
143
+ ```rust
144
+ use diarization::{
145
+ embed::EmbedModel,
146
+ plda::PldaTransform,
147
+ segment::SegmentModel,
148
+ };
149
+ // Segmentation + PLDA are bundled by default — no download needed.
150
+ let mut seg = SegmentModel::bundled()?;
151
+ let plda = PldaTransform::new()?;
152
+ // WeSpeaker is BYO; download from this repo.
153
+ let mut emb = EmbedModel::from_file("wespeaker_resnet34_lm.onnx")?;
154
+ # Ok::<(), Box<dyn std::error::Error>>(())
155
+ ```
156
+
157
+ ### Direct download
158
+ ```bash
159
+ # whole bundle
160
+ hf download FinDIT-Studio/dia-models --local-dir ./dia-models
161
+
162
+ # just the embedding model (default ONNX form)
163
+ hf download FinDIT-Studio/dia-models \
164
+ wespeaker_resnet34_lm.onnx wespeaker_resnet34_lm.onnx.data \
165
+ --local-dir ./models
166
+
167
+ # CoreML-friendly single-file form
168
+ hf download FinDIT-Studio/dia-models \
169
+ wespeaker_resnet34_lm_packed.onnx --local-dir ./models
170
+ ```
171
+
172
+ ## Licenses
173
+
174
+ This repository **redistributes** model artifacts under three different
175
+ licenses. Each artifact retains its upstream license. By using this
176
+ bundle you agree to comply with **all three**:
177
+
178
+ - **MIT** for `segmentation-3.0.onnx` (Copyright © 2023 CNRS, Hervé Bredin).
179
+ See `LICENSE.MIT`.
180
+ - **Apache-2.0** for the WeSpeaker artifacts. See `LICENSE.APACHE-2.0`.
181
+ - **CC-BY-4.0** for everything under `plda/`. See `LICENSE.CC-BY-4.0`.
182
+ Required attribution: *PLDA model trained by BUT Speech@FIT;
183
+ integration of VBx in pyannote.audio by Jiangyu Han and Petr Pálka.*
184
+
185
+ The `dia` Rust crate that consumes these models is itself dual-licensed
186
+ MIT OR Apache-2.0; that licensing applies to the source code, not to the
187
+ model weights bundled here.
188
+
189
+ ## Citation
190
+
191
+ If you use these weights in academic work, please cite the upstream
192
+ papers / model cards:
193
+
194
+ - **Segmentation-3.0:** Hervé Bredin, *pyannote.audio 2.1 speaker
195
+ diarization pipeline: principle, benchmark, and recipe*, Interspeech
196
+ 2023.
197
+ - **WeSpeaker:** Wang et al., *WeSpeaker: A research and production
198
+ oriented speaker embedding learning toolkit*, ICASSP 2023.
199
+ - **PLDA / VBx:** Landini et al., *Bayesian HMM clustering of x-vector
200
+ sequences (VBx) in speaker diarization: theory, implementation and
201
+ analysis on standard tasks*, Computer Speech & Language, 2022.
202
+
203
+ ## Issues / questions
204
+
205
+ This repo is a **redistribution** of upstream artifacts. Please file
206
+ issues against:
207
+
208
+ - The dia Rust crate: <https://github.com/al8n/diarization/issues>
209
+ - The pyannote.audio project: <https://github.com/pyannote/pyannote-audio/issues>
210
+ - The WeSpeaker project: <https://github.com/wenet-e2e/wespeaker/issues>
plda/eigenvectors_desc.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:caaa0425dc9cc73ffe559f5abe0b8010e31792050f6bd5922eb15ddb84b4f5ee
3
+ size 131072
plda/lda.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3497ba2d97beaa73b310e34a0f4ccc0648a0ca48069699c225063f0d972ba91d
3
+ size 262144
plda/mean1.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ff32f244658ff69c12b11c19e0f95e4ef8d33f22781f0e2821c4ac986941487
3
+ size 2048
plda/mean2.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1858e57f04937fbb89726ac09c98e9905d182eb6fe7c6aff2b5bdb0fd30564c3
3
+ size 1024
plda/mu.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d67b1d54babdfd8c2547b9e5ba96abb30079b5bcc6b4aaa5985e77571537c798
3
+ size 1024
plda/phi_desc.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1193f733a91d72f4080993471b5971fa555dd9bdf425766fabef835bf73df541
3
+ size 1024
plda/plda.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9b77bcd840692710dd3496f62ecfeed8d8e5f002fd991b785079b244eab7d255
3
+ size 133852
plda/psi.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:72c17389945e40565bd3e07cf529dad328e4f28649b5c286dee92348f623b76b
3
+ size 1024
plda/tr.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f7acd9020ace3908b7ba37c04091fe0555fff4ed0678647866ce2c67208b76f6
3
+ size 131072
plda/xvec_transform.npz ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:325f1ce8e48f7e55e9c8aa47e05d2766b7c48c4b25b8de8dd751e7a4cc5fbe8f
3
+ size 134376
segmentation-3.0.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:057ee564753071c0b09b5b611648b50ac188d50846bff5f01e9f7bbf1591ea25
3
+ size 5986908
wespeaker_resnet34_lm.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b058ea970bc4c713e4afe0b3d8fe1c2b6439ba94fd912368cd954039deb2cfa5
3
+ size 262499
wespeaker_resnet34_lm.onnx.data ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0590f068ff1ba0f6718a735a3c71fbd8b0ac41fbac4569654707977eb9a4394e
3
+ size 26542080
wespeaker_resnet34_lm.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8bd868195ea0e672fb44999bfd10fa6110e688c7de9e3584583dad2da30ef501
3
+ size 26816730
wespeaker_resnet34_lm_packed.onnx ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4c15c6be4235318d092c9d347e00c68ba476136d6172f675f76ad6b0c2661f01
3
+ size 26775311