whisperd-nl / README.md
pevers's picture
Update README.md
424e9bb verified
---
language:
- nl
tags:
- whisper
- speech-recognition
- dutch
- automatic-speech-recognition
license: mit
base_model: openai/whisper-large-v3
pipeline_tag: automatic-speech-recognition
---
# WhisperD-NL: Fine-tuned Whisper for Dutch Speech Recognition
WhisperD-NL is a fine-tuned Whisper model trained on the Corpus Gesproken Nederlands (CGN) specifically to detect disfluencies, speakers and non-speech events.
## Model Details
- **Base Model**: openai/whisper-large-v3
- **Language**: Dutch (nl)
- **Task**: Automatic Speech Recognition
- **Fine-tuning**: Corpus Gesproken Nederlands (CGN)
- **Speaker Identification**: Speaker identification is implemented up to four different speakers via a tag ([S1], [S2], [S3] and [S4])
- **WER**: 16.42 for disfluencies, speaker identification and non-speech events based on whisper-large-v3
## Usage
```python
from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
import torch
import soundfile as sf
# Load model and processor
processor = AutoProcessor.from_pretrained("pevers/whisperd-nl")
model = AutoModelForSpeechSeq2Seq.from_pretrained("pevers/whisperd-nl")
# Load and preprocess audio
audio, sr = sf.read("path_to_dutch_audio.wav")
inputs = processor(audio, sampling_rate=sr, return_tensors="pt")
# Generate transcription
with torch.no_grad():
predicted_ids = model.generate(inputs.input_features)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcription)
```
## Limitations
- Optimized specifically for Dutch language with disfluencies and non-speech events
- Inherits limitations from the base Whisper model