Spaces:

andrijdavid
/

diarization

Running

App Files Files Community

diarization / README.md

andrijdavid

Resolve merge conflict in README.md

ddbe379 5 months ago

preview code

raw

history blame contribute delete

2.44 kB

	---
	title: Speaker Diarization, Transcription & Translation
	emoji: 🎙️
	colorFrom: blue
	colorTo: red
	sdk: gradio
	sdk_version: 3.43.2
	app_file: app.py
	pinned: false
	tags:
	- audio
	- speech-to-text
	- speaker-diarization
	- translation
	- whisper
	- pyannote
	- multilingual
	---

	# Speaker Diarization, Transcription & Translation

	This Hugging Face Space combines three powerful speech processing capabilities in a single workflow:

	- Speaker Diarization - Distinguishes between different speakers in your audio, labeling segments as Speaker 1, Speaker 2, etc.
	- Speech Transcription - Converts spoken words into accurate text using state-of-the-art ASR models
	- Automatic Translation - Detects non-English content and translates it to English seamlessly

	## Features

	- Automatic language detection
	- Speaker identification and labeling
	- High-accuracy speech-to-text transcription
	- Translation of non-English content to English
	- Timestamped output with speaker attribution
	- Support for multiple audio formats (MP3, WAV, etc.)

	## Typical Use Cases

	- Meeting Analysis - Get timestamped transcripts with speaker labels from team calls
	- Interview Processing - Automatically separate interviewer and interviewee responses
	- Podcast Production - Generate accurate show notes with speaker attribution
	- Multilingual Content - Handle recordings in multiple languages with automatic English output

	## How It Works

	1. Upload an audio file (MP3, WAV, or other common formats)
	2. The system automatically detects the language
	3. Identifies unique speakers and when they speak
	4. Transcribes all speech with high accuracy
	5. Translates non-English content to English while preserving speaker labels

	## Built With

	- [Whisper](https://openai.com/research/whisper) for transcription
	- [Pyannote.audio](https://github.com/pyannote/pyannote-audio) for speaker diarization
	- [Helsinki-NLP Translation Models](https://huggingface.co/Helsinki-NLP) for translation
	- [Gradio](https://gradio.app/) for the web interface

	## Local Installation

	To run this Space locally:

	```bash
	git clone <repository-url>
	cd diarization-transcription-translation
	pip install -r requirements.txt
	python app.py
	```

	## Notes

	- The diarization component requires authentication with Hugging Face for pyannote.audio models
	- Processing time depends on the length of the audio file
	- For best results, ensure good audio quality with clear speech