Add README with model card and license attribution

95c7b88 verified 2 days ago

2.76 kB

	---
	license: other
	license_name: mixed
	license_link: LICENSE.md
	tags:
	- speaker-diarization
	- coreml
	- apple
	- macos
	- ios
	- sortformer
	- wespeaker
	- speaker-embedding
	language:
	- en
	pipeline_tag: audio-classification
	---

	# Speaker Diarization CoreML Models

	CoreML conversions of speaker diarization and speaker embedding models for on-device inference on Apple platforms.

	## Models

	\| Model \| Original \| Size \| Description \|
	\|-------\|----------\|------\|-------------\|
	\| `sortformer_4spk_v21.mlpackage` \| [nvidia/diar_streaming_sortformer_4spk-v2.1](https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1) \| 441 MB \| Sortformer diarization model — end-to-end neural speaker diarization supporting up to 4 speakers, streaming capable \|
	\| `wespeaker_resnet34.mlpackage` \| [WeSpeaker ResNet34](https://github.com/wenet-e2e/wespeaker) \| 25 MB \| ResNet34 speaker embedding model — extracts 256-dim speaker embeddings for speaker verification and identification \|

	## Format

	Both models are in Apple `.mlpackage` format (FP32). On first load, CoreML compiles them to `.mlmodelc` and caches the compiled version for subsequent fast loading.

	- Sortformer: Input `mel_features (B, 128, T)` → Output `speaker_probs (B, T/8, 4)` sigmoid probabilities per speaker per frame
	- ResNet34: Input `fbank_features (1, 80, T)` → Output `embedding (1, 256)` speaker embedding vector

	## Usage

	These models are designed for use with the [AxiiDiarization](https://github.com/AugustDev/AxiiDiarization) Swift library:

	```swift
	import AxiiDiarization

	let pipeline = try DiarizationPipeline(
	sortformerModelPath: "path/to/sortformer_4spk_v21.mlpackage",
	embModelPath: "path/to/wespeaker_resnet34.mlpackage"
	)

	let result = try pipeline.run(samples: audioSamples)
	for segment in result.segments {
	print("\(segment.speaker.label): \(segment.start)s - \(segment.end)s")
	}
	```

	## Licenses

	The models in this repository have separate licenses from their original authors:

	- Sortformer v2.1: Licensed by NVIDIA Corporation under the [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/). Commercial use permitted. See original model card: [nvidia/diar_streaming_sortformer_4spk-v2.1](https://huggingface.co/nvidia/diar_streaming_sortformer_4spk-v2.1)

	- WeSpeaker ResNet34: Licensed under [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0). See original project: [wenet-e2e/wespeaker](https://github.com/wenet-e2e/wespeaker)

	The CoreML conversion code and this repository are MIT licensed.

	## Acknowledgments

	- NVIDIA NeMo team for the Sortformer diarization model
	- WeSpeaker / WeNet team for the ResNet34 speaker embedding model