pevers
/

whisperd-nl

Automatic Speech Recognition

speech-recognition

Model card Files Files and versions

whisperd-nl / README.md

pevers's picture

Update README.md

424e9bb verified 8 months ago

|

history blame contribute delete

1.64 kB

	---
	language:
	- nl
	tags:
	- whisper
	- speech-recognition
	- dutch
	- automatic-speech-recognition
	license: mit
	base_model: openai/whisper-large-v3
	pipeline_tag: automatic-speech-recognition
	---

	# WhisperD-NL: Fine-tuned Whisper for Dutch Speech Recognition

	WhisperD-NL is a fine-tuned Whisper model trained on the Corpus Gesproken Nederlands (CGN) specifically to detect disfluencies, speakers and non-speech events.

	## Model Details

	- Base Model: openai/whisper-large-v3
	- Language: Dutch (nl)
	- Task: Automatic Speech Recognition
	- Fine-tuning: Corpus Gesproken Nederlands (CGN)
	- Speaker Identification: Speaker identification is implemented up to four different speakers via a tag ([S1], [S2], [S3] and [S4])
	- WER: 16.42 for disfluencies, speaker identification and non-speech events based on whisper-large-v3

	## Usage

	```python
	from transformers import AutoProcessor, AutoModelForSpeechSeq2Seq
	import torch
	import soundfile as sf

	# Load model and processor
	processor = AutoProcessor.from_pretrained("pevers/whisperd-nl")
	model = AutoModelForSpeechSeq2Seq.from_pretrained("pevers/whisperd-nl")

	# Load and preprocess audio
	audio, sr = sf.read("path_to_dutch_audio.wav")
	inputs = processor(audio, sampling_rate=sr, return_tensors="pt")

	# Generate transcription
	with torch.no_grad():
	predicted_ids = model.generate(inputs.input_features)

	transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
	print(transcription)
	```

	## Limitations

	- Optimized specifically for Dutch language with disfluencies and non-speech events
	- Inherits limitations from the base Whisper model