Spaces:

andrijdavid
/

diarization

Running

App Files Files Community

diarization / README.md

andrijdavid

Resolve merge conflict in README.md

ddbe379 5 months ago

preview code

raw

history blame contribute delete

2.44 kB

A newer version of the Gradio SDK is available: 6.6.0

Upgrade

metadata

title: Speaker Diarization, Transcription & Translation
emoji: 🎙️
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.43.2
app_file: app.py
pinned: false
tags:
  - audio
  - speech-to-text
  - speaker-diarization
  - translation
  - whisper
  - pyannote
  - multilingual

Speaker Diarization, Transcription & Translation

This Hugging Face Space combines three powerful speech processing capabilities in a single workflow:

Speaker Diarization - Distinguishes between different speakers in your audio, labeling segments as Speaker 1, Speaker 2, etc.
Speech Transcription - Converts spoken words into accurate text using state-of-the-art ASR models
Automatic Translation - Detects non-English content and translates it to English seamlessly

Features

Automatic language detection
Speaker identification and labeling
High-accuracy speech-to-text transcription
Translation of non-English content to English
Timestamped output with speaker attribution
Support for multiple audio formats (MP3, WAV, etc.)

Typical Use Cases

Meeting Analysis - Get timestamped transcripts with speaker labels from team calls
Interview Processing - Automatically separate interviewer and interviewee responses
Podcast Production - Generate accurate show notes with speaker attribution
Multilingual Content - Handle recordings in multiple languages with automatic English output

How It Works

Upload an audio file (MP3, WAV, or other common formats)
The system automatically detects the language
Identifies unique speakers and when they speak
Transcribes all speech with high accuracy
Translates non-English content to English while preserving speaker labels

Built With

Whisper for transcription
Pyannote.audio for speaker diarization
Helsinki-NLP Translation Models for translation
Gradio for the web interface

Local Installation

To run this Space locally:

git clone <repository-url>
cd diarization-transcription-translation
pip install -r requirements.txt
python app.py

Notes

The diarization component requires authentication with Hugging Face for pyannote.audio models
Processing time depends on the length of the audio file
For best results, ensure good audio quality with clear speech