Robust Speech Recognition via Large-Scale Weak Supervision
Paper
β’
2212.04356
β’
Published
β’
47
This model contains a complete Whisper fine-tuning experiment including:
βββ checkpoints/ # Training checkpoints
β βββ CKPT+epoch_*/ # Per-epoch checkpoints
β βββ CKPT+BEST_WER/ # Best WER checkpoint
β βββ CKPT+FINAL/ # Final checkpoint
βββ final_model/ # Transformers-compatible model
β βββ config.json # Model configuration
β βββ model.safetensors # Model weights
β βββ preprocessor_config.json
β βββ tokenizer_config.json
β βββ ...
βββ test_results.json # Test metrics
βββ detailed_metrics.json # Detailed training history
βββ training_history_speechbrain.png # Training curves
βββ training_report_speechbrain.txt # Summary report
import torch
checkpoint = torch.load('checkpoints/CKPT+BEST_WER/model.ckpt')
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model = WhisperForConditionalGeneration.from_pretrained("./final_model")
processor = WhisperProcessor.from_pretrained("./final_model")
If you use this experiment data, please cite the original Whisper paper:
@article{radford2022robust,
title={Robust speech recognition via large-scale weak supervision},
author={Radford, Alec and Kim, Jong Wook and Xu, Tao and Brockman, Greg and McLeavey, Christine and Sutskever, Ilya},
journal={arXiv preprint arXiv:2212.04356},
year={2022}
}
Base model
openai/whisper-base