--- title: Speaker Diarization, Transcription & Translation emoji: 🎙️ colorFrom: blue colorTo: red sdk: gradio sdk_version: 3.43.2 app_file: app.py pinned: false tags: - audio - speech-to-text - speaker-diarization - translation - whisper - pyannote - multilingual --- # Speaker Diarization, Transcription & Translation This Hugging Face Space combines three powerful speech processing capabilities in a single workflow: - **Speaker Diarization** - Distinguishes between different speakers in your audio, labeling segments as Speaker 1, Speaker 2, etc. - **Speech Transcription** - Converts spoken words into accurate text using state-of-the-art ASR models - **Automatic Translation** - Detects non-English content and translates it to English seamlessly ## Features - Automatic language detection - Speaker identification and labeling - High-accuracy speech-to-text transcription - Translation of non-English content to English - Timestamped output with speaker attribution - Support for multiple audio formats (MP3, WAV, etc.) ## Typical Use Cases - **Meeting Analysis** - Get timestamped transcripts with speaker labels from team calls - **Interview Processing** - Automatically separate interviewer and interviewee responses - **Podcast Production** - Generate accurate show notes with speaker attribution - **Multilingual Content** - Handle recordings in multiple languages with automatic English output ## How It Works 1. Upload an audio file (MP3, WAV, or other common formats) 2. The system automatically detects the language 3. Identifies unique speakers and when they speak 4. Transcribes all speech with high accuracy 5. Translates non-English content to English while preserving speaker labels ## Built With - [Whisper](https://openai.com/research/whisper) for transcription - [Pyannote.audio](https://github.com/pyannote/pyannote-audio) for speaker diarization - [Helsinki-NLP Translation Models](https://huggingface.co/Helsinki-NLP) for translation - [Gradio](https://gradio.app/) for the web interface ## Local Installation To run this Space locally: ```bash git clone cd diarization-transcription-translation pip install -r requirements.txt python app.py ``` ## Notes - The diarization component requires authentication with Hugging Face for pyannote.audio models - Processing time depends on the length of the audio file - For best results, ensure good audio quality with clear speech