diarization / README.md
andrijdavid's picture
Resolve merge conflict in README.md
ddbe379

A newer version of the Gradio SDK is available: 6.6.0

Upgrade
metadata
title: Speaker Diarization, Transcription & Translation
emoji: 🎙️
colorFrom: blue
colorTo: red
sdk: gradio
sdk_version: 3.43.2
app_file: app.py
pinned: false
tags:
  - audio
  - speech-to-text
  - speaker-diarization
  - translation
  - whisper
  - pyannote
  - multilingual

Speaker Diarization, Transcription & Translation

This Hugging Face Space combines three powerful speech processing capabilities in a single workflow:

  • Speaker Diarization - Distinguishes between different speakers in your audio, labeling segments as Speaker 1, Speaker 2, etc.
  • Speech Transcription - Converts spoken words into accurate text using state-of-the-art ASR models
  • Automatic Translation - Detects non-English content and translates it to English seamlessly

Features

  • Automatic language detection
  • Speaker identification and labeling
  • High-accuracy speech-to-text transcription
  • Translation of non-English content to English
  • Timestamped output with speaker attribution
  • Support for multiple audio formats (MP3, WAV, etc.)

Typical Use Cases

  • Meeting Analysis - Get timestamped transcripts with speaker labels from team calls
  • Interview Processing - Automatically separate interviewer and interviewee responses
  • Podcast Production - Generate accurate show notes with speaker attribution
  • Multilingual Content - Handle recordings in multiple languages with automatic English output

How It Works

  1. Upload an audio file (MP3, WAV, or other common formats)
  2. The system automatically detects the language
  3. Identifies unique speakers and when they speak
  4. Transcribes all speech with high accuracy
  5. Translates non-English content to English while preserving speaker labels

Built With

Local Installation

To run this Space locally:

git clone <repository-url>
cd diarization-transcription-translation
pip install -r requirements.txt
python app.py

Notes

  • The diarization component requires authentication with Hugging Face for pyannote.audio models
  • Processing time depends on the length of the audio file
  • For best results, ensure good audio quality with clear speech