| | --- |
| | language: |
| | - nl |
| | tags: |
| | - whisper |
| | - speech-recognition |
| | - dutch |
| | - automatic-speech-recognition |
| | license: mit |
| | base_model: openai/whisper-large-v3 |
| | pipeline_tag: automatic-speech-recognition |
| | --- |
| | |
| | # WhisperD-NL: Fine-tuned Whisper for Dutch Speech Recognition |
| |
|
| | WhisperD-NL is a fine-tuned Whisper model trained on the Corpus Gesproken Nederlands (CGN) specifically to detect disfluencies, speakers and non-speech events. |
| |
|
| | ## Model Details |
| |
|
| | - **Base Model**: openai/whisper-large-v3 |
| | - **Language**: Dutch (nl) |
| | - **Task**: Automatic Speech Recognition |
| | - **Fine-tuning**: Corpus Gesproken Nederlands (CGN) |
| | - **Speaker Identification**: Speaker identification is implemented up to four different speakers via a tag ([S1], [S2], [S3] and [S4]) |
| | - **WER**: 16.42 for disfluencies, speaker identification and non-speech events based on whisper-large-v3 |
| |
|
| | ## Usage |
| |
|
| | ```python |
| | from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq |
| | import torch |
| | import soundfile as sf |
| | |
| | # Load model and processor |
| | processor = AutoProcessor.from_pretrained("pevers/whisperd-nl") |
| | model = AutoModelForSpeechSeq2Seq.from_pretrained("pevers/whisperd-nl") |
| | |
| | # Load and preprocess audio |
| | audio, sr = sf.read("path_to_dutch_audio.wav") |
| | inputs = processor(audio, sampling_rate=sr, return_tensors="pt") |
| | |
| | # Generate transcription |
| | with torch.no_grad(): |
| | predicted_ids = model.generate(inputs.input_features) |
| | |
| | transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0] |
| | print(transcription) |
| | ``` |
| |
|
| | ## Limitations |
| |
|
| | - Optimized specifically for Dutch language with disfluencies and non-speech events |
| | - Inherits limitations from the base Whisper model |
| |
|