ASR-finetuning / README.md
saadmannan's picture
app file reviewed
b79357c

A newer version of the Gradio SDK is available: 6.3.0

Upgrade
metadata
title: Whisper German ASR
emoji: πŸŽ™οΈ
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.0.0
app_file: app.py
pinned: false
license: mit

πŸŽ™οΈ Whisper German ASR

Fine-tuned Whisper model for German Automatic Speech Recognition (ASR).

Description

This Space provides an interactive interface for transcribing German audio using a fine-tuned version of OpenAI's Whisper-small model. The model has been specifically optimized for German speech recognition.

How to Use

  1. Upload Audio: Click on the audio input area to upload an audio file (WAV, MP3, FLAC, etc.)
    • OR -
  2. Record Audio: Use the microphone button to record audio directly
  3. Transcribe: Click the "Transcribe" button to generate the transcription
  4. View Results: The transcription will appear on the right side

Model Details

  • Base Model: OpenAI Whisper-small (242M parameters)
  • Fine-tuned on: German MINDS14 dataset
  • Language: German (de)
  • Task: Transcription
  • Performance: ~13% Word Error Rate (WER)

Features

  • βœ… Upload audio files in various formats
  • βœ… Record audio directly from microphone
  • βœ… Real-time transcription
  • βœ… Optimized for German language
  • βœ… Support for audio up to 30 seconds

Technical Specifications

  • Sample Rate: 16kHz
  • Max Duration: 30 seconds
  • Beam Search: 5 beams
  • Device: CPU/GPU auto-detection

Tips for Best Results

  • Speak clearly and at a moderate pace
  • Minimize background noise
  • Ensure audio is in German language
  • Keep audio clips between 1-30 seconds for optimal results

Links

License

MIT License

Acknowledgments