YAML Metadata Warning: empty or missing yaml metadata in repo card

Check out the documentation for more information.

Pyannote

Run Pyannote optimized for Qualcomm SnapDragon device's NPU with nexaSDK.

Quickstart

  1. Install NexaSDK and create a free account at sdk.nexa.ai

  2. Activate your device with your access token:

    nexa config set license '<access_token>'
    
  3. Run the model on Qualcomm NPU in one line:

    nexa infer NexaAI/Pyannote-NPU
    
  • Input: Enter input audio path,
  • Output: Returns speech diarization results, or report error if any required input cannot be found

Model Description

pyannote-audio (Community Version) is an open-source speech diarization model designed for accurate speaker segmentation and labeling in audio streams.
Developed by the Pyannote community, it combines audio processing, speaker embedding, and clustering into a unified framework, enabling robust speech segmentation on local machines without cloud dependency.

Features

  • 🔊 End-to-End Diarization Pipeline — Automatically detects and labels who spoke when in an audio file.
  • Lightweight & Efficient — Optimized for real-time or batch processing on consumer hardware and GPUs.
  • 🧠 Speaker Embedding & Clustering — Extracts rich speaker representations and groups them for identity separation.
  • 🔧 Customizable & Modular — Easily integrates with PyTorch pipelines or modified components for research and prototyping.
  • 🌍 Community-Driven & Transparent — Fully open and maintained by an active community of speech researchers and developers.

Use Cases

  • Meeting Transcription: Segment conversations by speaker for clearer transcripts.
  • Broadcast and Podcast Analysis: Attribute voices and structure long-form audio content.
  • Call Center Analytics: Separate agent and customer segments for interaction insights.
  • Research: Test diarization algorithms or contribute new speaker models.
  • Voice Dataset Preparation: Preprocess large audio datasets for training ASR or emotion recognition systems.

Inputs and Outputs

Input

  • Audio file or stream

Output

  • Speaker-labeled time segments

License

This repo is licensed under the Creative Commons Attribution–NonCommercial 4.0 (CC BY-NC 4.0) license, which allows use, sharing, and modification only for non-commercial purposes with proper attribution.
All NPU-related models, runtimes, and code in this project are protected under this non-commercial license and cannot be used in any commercial or revenue-generating applications.
Commercial licensing or enterprise usage requires a separate agreement.
For inquiries, please contact dev@nexa.ai.

Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including NexaAI/Pyannote-NPU