YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
Pyannote
Run Pyannote optimized for Qualcomm SnapDragon device's NPU with nexaSDK.
Quickstart
Install NexaSDK and create a free account at sdk.nexa.ai
Activate your device with your access token:
nexa config set license '<access_token>'Run the model on Qualcomm NPU in one line:
nexa infer NexaAI/Pyannote-NPU
- Input: Enter input audio path,
- Output: Returns speech diarization results, or report error if any required input cannot be found
Model Description
pyannote-audio (Community Version) is an open-source speech diarization model designed for accurate speaker segmentation and labeling in audio streams.
Developed by the Pyannote community, it combines audio processing, speaker embedding, and clustering into a unified framework, enabling robust speech segmentation on local machines without cloud dependency.
Features
- 🔊 End-to-End Diarization Pipeline — Automatically detects and labels who spoke when in an audio file.
- ⚡ Lightweight & Efficient — Optimized for real-time or batch processing on consumer hardware and GPUs.
- 🧠 Speaker Embedding & Clustering — Extracts rich speaker representations and groups them for identity separation.
- 🔧 Customizable & Modular — Easily integrates with PyTorch pipelines or modified components for research and prototyping.
- 🌍 Community-Driven & Transparent — Fully open and maintained by an active community of speech researchers and developers.
Use Cases
- Meeting Transcription: Segment conversations by speaker for clearer transcripts.
- Broadcast and Podcast Analysis: Attribute voices and structure long-form audio content.
- Call Center Analytics: Separate agent and customer segments for interaction insights.
- Research: Test diarization algorithms or contribute new speaker models.
- Voice Dataset Preparation: Preprocess large audio datasets for training ASR or emotion recognition systems.
Inputs and Outputs
Input
- Audio file or stream
Output
- Speaker-labeled time segments
License
This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution.
All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications.
Commercial licensing or enterprise usage requires a separate agreement.
For inquiries, please contact dev@nexa.ai.
- Downloads last month
- 1