Spaces:

saadmannan
/

ASR-finetuning

Sleeping

App Files Files Community

ASR-finetuning / README.md

saadmannan

app file reviewed

b79357c 2 months ago

preview code

raw

history blame contribute delete

2.03 kB

A newer version of the Gradio SDK is available: 6.3.0

Upgrade

metadata

title: Whisper German ASR
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit

🎙️ Whisper German ASR

Fine-tuned Whisper model for German Automatic Speech Recognition (ASR).

Description

This Space provides an interactive interface for transcribing German audio using a fine-tuned version of OpenAI's Whisper-small model. The model has been specifically optimized for German speech recognition.

How to Use

Upload Audio: Click on the audio input area to upload an audio file (WAV, MP3, FLAC, etc.)
- OR -
Record Audio: Use the microphone button to record audio directly
Transcribe: Click the "Transcribe" button to generate the transcription
View Results: The transcription will appear on the right side

Model Details

Base Model: OpenAI Whisper-small (242M parameters)
Fine-tuned on: German MINDS14 dataset
Language: German (de)
Task: Transcription
Performance: ~13% Word Error Rate (WER)

Features

✅ Upload audio files in various formats
✅ Record audio directly from microphone
✅ Real-time transcription
✅ Optimized for German language
✅ Support for audio up to 30 seconds

Technical Specifications

Sample Rate: 16kHz
Max Duration: 30 seconds
Beam Search: 5 beams
Device: CPU/GPU auto-detection

Tips for Best Results

Speak clearly and at a moderate pace
Minimize background noise
Ensure audio is in German language
Keep audio clips between 1-30 seconds for optimal results

License

MIT License

Acknowledgments

OpenAI Whisper for the base model
Hugging Face for Transformers library
PolyAI for the MINDS14 dataset