Spaces:
Running
Running
| title: Speaker Diarization, Transcription & Translation | |
| emoji: 🎙️ | |
| colorFrom: blue | |
| colorTo: red | |
| sdk: gradio | |
| sdk_version: 3.43.2 | |
| app_file: app.py | |
| pinned: false | |
| tags: | |
| - audio | |
| - speech-to-text | |
| - speaker-diarization | |
| - translation | |
| - whisper | |
| - pyannote | |
| - multilingual | |
| # Speaker Diarization, Transcription & Translation | |
| This Hugging Face Space combines three powerful speech processing capabilities in a single workflow: | |
| - **Speaker Diarization** - Distinguishes between different speakers in your audio, labeling segments as Speaker 1, Speaker 2, etc. | |
| - **Speech Transcription** - Converts spoken words into accurate text using state-of-the-art ASR models | |
| - **Automatic Translation** - Detects non-English content and translates it to English seamlessly | |
| ## Features | |
| - Automatic language detection | |
| - Speaker identification and labeling | |
| - High-accuracy speech-to-text transcription | |
| - Translation of non-English content to English | |
| - Timestamped output with speaker attribution | |
| - Support for multiple audio formats (MP3, WAV, etc.) | |
| ## Typical Use Cases | |
| - **Meeting Analysis** - Get timestamped transcripts with speaker labels from team calls | |
| - **Interview Processing** - Automatically separate interviewer and interviewee responses | |
| - **Podcast Production** - Generate accurate show notes with speaker attribution | |
| - **Multilingual Content** - Handle recordings in multiple languages with automatic English output | |
| ## How It Works | |
| 1. Upload an audio file (MP3, WAV, or other common formats) | |
| 2. The system automatically detects the language | |
| 3. Identifies unique speakers and when they speak | |
| 4. Transcribes all speech with high accuracy | |
| 5. Translates non-English content to English while preserving speaker labels | |
| ## Built With | |
| - [Whisper](https://openai.com/research/whisper) for transcription | |
| - [Pyannote.audio](https://github.com/pyannote/pyannote-audio) for speaker diarization | |
| - [Helsinki-NLP Translation Models](https://huggingface.co/Helsinki-NLP) for translation | |
| - [Gradio](https://gradio.app/) for the web interface | |
| ## Local Installation | |
| To run this Space locally: | |
| ```bash | |
| git clone <repository-url> | |
| cd diarization-transcription-translation | |
| pip install -r requirements.txt | |
| python app.py | |
| ``` | |
| ## Notes | |
| - The diarization component requires authentication with Hugging Face for pyannote.audio models | |
| - Processing time depends on the length of the audio file | |
| - For best results, ensure good audio quality with clear speech | |