Whisper-Medium fine-tuned for reverberant speech (Whisper-RIR-Mega)

Use this model when: transcribing speech that was recorded in reverberant or “roomy” conditions (meetings, lectures, far-field mics). It keeps the same WER as the base Whisper-medium on clean/reverberant benchmarks while being trained specifically on reverberant data.

This model is a fine-tuned version of openai/whisper-medium on the Whisper-RIR-Mega dataset for ASR robustness to room reverberation. One-line load; no PEFT needed.

Quick usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import librosa

processor = WhisperProcessor.from_pretrained("mandipgoswami/whisper-medium-rirmega")
model = WhisperForConditionalGeneration.from_pretrained("mandipgoswami/whisper-medium-rirmega")

audio, sr = librosa.load("path/to/reverberant_audio.wav", sr=16000)
input_features = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
predicted_ids = model.generate(input_features)
transcript = processor.batch_decode(predicted_ids, skip_special_tokens=True)[0]
print(transcript)

When to use

Reverberant or room-recorded speech (meetings, lectures, far-field).
You want English ASR with the same ease as base Whisper (single from_pretrained).
You care about robustness to room acoustics without losing clean-speech quality.

Training

Base model: openai/whisper-medium
Dataset: Whisper-RIR-Mega (reverberant speech with clean transcripts)
Epochs: 4
Learning rate: 8e-06
Effective batch size: 16 (2 × 8 gradient accumulation)
Precision: BF16/FP16 mixed precision
Gradient checkpointing: Enabled
Hardware: Single NVIDIA RTX 5080 (16 GB)

Evaluation

Dataset	Split	WER
Whisper-RIR-Mega	test	0.0430

Limitations

English only. Trained on 400 reverberant samples; best used in conditions similar to the Whisper-RIR-Mega benchmark.

Citation

If you use this model, please cite:

@article{goswami2026whisperrirmega,
  title={Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics},
  author={Goswami, Mandip},
  journal={arXiv preprint arXiv:2603.02252},
  year={2026}
}

Downloads last month: 4

Safetensors

Model size

0.8B params

Tensor type

F32

Model tree for mandipgoswami/whisper-medium-rirmega

Base model

openai/whisper-medium

Finetuned

(890)

this model

Dataset used to train mandipgoswami/whisper-medium-rirmega

Space using mandipgoswami/whisper-medium-rirmega 1

Paper for mandipgoswami/whisper-medium-rirmega

Whisper-RIR-Mega: A Paired Clean-Reverberant Speech Benchmark for ASR Robustness to Room Acoustics

Paper • 2603.02252 • Published Feb 27 • 1

Evaluation results

WER on Whisper-RIR-Mega (test)
test set self-reported

0.043