Spaces:
Running
Running
A newer version of the Gradio SDK is available:
6.6.0
metadata
title: Speaker Diarization, Transcription & Translation
emoji: 🎙️
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.43.2
app_file: app.py
pinned: false
tags:
- audio
- speech-to-text
- speaker-diarization
- translation
- whisper
- pyannote
- multilingual
Speaker Diarization, Transcription & Translation
This Hugging Face Space combines three powerful speech processing capabilities in a single workflow:
- Speaker Diarization - Distinguishes between different speakers in your audio, labeling segments as Speaker 1, Speaker 2, etc.
- Speech Transcription - Converts spoken words into accurate text using state-of-the-art ASR models
- Automatic Translation - Detects non-English content and translates it to English seamlessly
Features
- Automatic language detection
- Speaker identification and labeling
- High-accuracy speech-to-text transcription
- Translation of non-English content to English
- Timestamped output with speaker attribution
- Support for multiple audio formats (MP3, WAV, etc.)
Typical Use Cases
- Meeting Analysis - Get timestamped transcripts with speaker labels from team calls
- Interview Processing - Automatically separate interviewer and interviewee responses
- Podcast Production - Generate accurate show notes with speaker attribution
- Multilingual Content - Handle recordings in multiple languages with automatic English output
How It Works
- Upload an audio file (MP3, WAV, or other common formats)
- The system automatically detects the language
- Identifies unique speakers and when they speak
- Transcribes all speech with high accuracy
- Translates non-English content to English while preserving speaker labels
Built With
- Whisper for transcription
- Pyannote.audio for speaker diarization
- Helsinki-NLP Translation Models for translation
- Gradio for the web interface
Local Installation
To run this Space locally:
git clone <repository-url>
cd diarization-transcription-translation
pip install -r requirements.txt
python app.py
Notes
- The diarization component requires authentication with Hugging Face for pyannote.audio models
- Processing time depends on the length of the audio file
- For best results, ensure good audio quality with clear speech