Instructions to use soniqo/Pyannote-Segmentation-LiteRT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT
How to use soniqo/Pyannote-Segmentation-LiteRT with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
Pyannote Segmentation 3.0 β LiteRT
Speaker-aware segmentation for diarization pipelines. 16 kHz, 5-second windows.
Part of the soniqo.audio speech toolkit β an open, runtime-portable stack for speech AI. This bundle is the LiteRT export, designed to plug into the abstract interfaces in
speech-core(C++ voice-agent orchestration library). Browse all LiteRT bundles in the soniqo LiteRT collection.
Use cases on soniqo.audio
Powerset speaker segmentation (up to 3 local speakers) for Android, exported in a streaming 1-second chunk configuration.
Model
| Property | Value |
|---|---|
| Architecture | SincNet frontend + 4-layer BiLSTM + linear + powerset head |
| Parameters | ~1.5 M |
| Format | LiteRT (TFLite) |
| Quantization | float32 |
| Sample rate | 16 000 Hz |
| Chunk | 1 second (16 000 samples) |
| Output frames | 56 per chunk |
| LSTM state | explicit I/O, [2, 8, 1, 128] (h+c, 4 layers Γ 2 directions) |
Files
| File | Size | Description |
|---|---|---|
pyannote-segmentation.tflite |
6.93 MB | Full model, FP32 |
config.json |
1 KB | Signature + usage hints |
Why streaming chunks
pyannote/segmentation-3.0 at its trained 10-second window has 589 BiLSTM
time steps. litert-torch has no native aten.lstm lowering and unrolls
it into ~4700 cell operations. The resulting MLIR optimizer either hangs
for hours or fails on duplicate jax_lowering_* symbols from repeated
helper functions.
Exporting at 1-second chunks (56 time steps) compiles in ~2 minutes and
produces a valid TFLite. The caller runs 10 chunks in sequence, passing
lstm_state_out β lstm_state between calls, to cover the full 10-second
window. Each chunk produces 56 frames of powerset posteriors.
The SincNet frontend has small per-chunk edge effects: 10 Γ 56 = 560 frames versus 589 in the original model. Overlap chunks by ~500 ms on boundaries where high-precision stitching is required.
Signature
Inputs:
audio [1, 1, 16000] float32 1 s of audio @ 16 kHz
lstm_state [2, 8, 1, 128] float32 (h, c), zeros on first chunk
Outputs:
posteriors [1, 56, 7] float32 powerset posteriors
lstm_state_out [2, 8, 1, 128] float32 next-chunk state
Powerset classes (7): {β
, s1, s2, s3, s1βͺs2, s1βͺs3, s2βͺs3} β up to 3 local
speakers, no triple-overlap class.
Usage
val model = Interpreter(loadModelFile("pyannote-segmentation.tflite"))
var state = FloatArray(2 * 8 * 1 * 128) // zero on first call
fun segment(chunk: FloatArray): FloatArray {
val out = FloatArray(1 * 56 * 7)
val nextState = FloatArray(state.size)
model.runSignature(
mapOf(0 to chunk.toDirectBuffer(), 1 to state.toDirectBuffer()),
mapOf(0 to out, 1 to nextState),
)
state = nextState
return out // [56, 7] log-probs
}
Source
Upstream: pyannote/segmentation-3.0 (MIT, gated β accept the license on the upstream page).
Links
- speech-android β Android SDK
- soniqo.audio β website
- blog β blog
Ecosystem
- soniqo.audio β use-case explorer (transcription, voice cloning, live ASR, voice agents).
- speech-core β C++ orchestration library for voice agents. Abstract
STTInterface/TTSInterface/VADInterface/EnhancerInterface; LiteRT implementations plug straight into the interfaces. - speech-swift β Apple Silicon MLX companion runtime (model-specific MLX bundles linked above where applicable).
- speech-android β Android SDK consuming on-device LiteRT bundles.
Other LiteRT models in this collection
ASR / Transcription
- Parakeet TDT 0.6B v3 β LiteRT (INT8)
- Nemotron Speech Streaming 0.6B β LiteRT
- Omnilingual ASR CTC 300M β LiteRT
- Omnilingual ASR CTC 300M β LiteRT (INT8)
- Qwen3 ASR 0.6B Encoder β LiteRT (INT8)
VAD / Diarization
TTS / Voice Cloning
License
This bundle inherits the upstream model license (mit). See the
linked base_model repository for the full terms.
- Downloads last month
- 35
Model tree for soniqo/Pyannote-Segmentation-LiteRT
Base model
pyannote/segmentation-3.0