Spaces:

saadmannan
/

ASR-finetuning

Sleeping

App Files Files Community

ASR-finetuning / README.md

saadmannan

app file reviewed

b79357c 2 months ago

preview code

raw

history blame contribute delete

2.03 kB

	---
	title: Whisper German ASR
	emoji: 🎙️
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 4.0.0
	app_file: app.py
	pinned: false
	license: mit
	---

	# 🎙️ Whisper German ASR

	Fine-tuned Whisper model for German Automatic Speech Recognition (ASR).

	## Description

	This Space provides an interactive interface for transcribing German audio using a fine-tuned version of OpenAI's Whisper-small model. The model has been specifically optimized for German speech recognition.

	## How to Use

	1. Upload Audio: Click on the audio input area to upload an audio file (WAV, MP3, FLAC, etc.)
	- OR -
	2. Record Audio: Use the microphone button to record audio directly
	3. Transcribe: Click the "Transcribe" button to generate the transcription
	4. View Results: The transcription will appear on the right side

	## Model Details

	- Base Model: OpenAI Whisper-small (242M parameters)
	- Fine-tuned on: German MINDS14 dataset
	- Language: German (de)
	- Task: Transcription
	- Performance: ~13% Word Error Rate (WER)

	## Features

	- ✅ Upload audio files in various formats
	- ✅ Record audio directly from microphone
	- ✅ Real-time transcription
	- ✅ Optimized for German language
	- ✅ Support for audio up to 30 seconds

	## Technical Specifications

	- Sample Rate: 16kHz
	- Max Duration: 30 seconds
	- Beam Search: 5 beams
	- Device: CPU/GPU auto-detection

	## Tips for Best Results

	- Speak clearly and at a moderate pace
	- Minimize background noise
	- Ensure audio is in German language
	- Keep audio clips between 1-30 seconds for optimal results

	## Links

	- [GitHub Repository](https://github.com/YOUR_USERNAME/whisper-german-asr)
	- [Model Card](https://huggingface.co/YOUR_USERNAME/whisper-small-german)

	## License

	MIT License

	## Acknowledgments

	- [OpenAI Whisper](https://github.com/openai/whisper) for the base model
	- [Hugging Face](https://huggingface.co/) for Transformers library
	- [PolyAI](https://huggingface.co/datasets/PolyAI/minds14) for the MINDS14 dataset