This is an fp32 version of the vhdm/whisper-large-fa-v1 model, fine-tuned on the FaVoice dataset, compiled for Whisper-X/CTranslate2.
Fine-tuning details: the full model was fine-tuned from vhdm/whisper-large-fa-v1 on Byne/farsi-train-set-custom-audios, with the Byne/farsi-molfar-val hard set blended into the training data. Hyper-parameters: 15 epochs, learning rate 3e-6 (cosine, 5% warmup), batch size 16, bf16, SpecAugment 0.095, label smoothing 0.12, seed 42. Word Error Rate is ~34% on a held-out 10% split of the custom-audios set (benchmark-normalised scoring).
Important: when copying the model, ensure that you copy all files in the repo. Missing files can lead to a silent failure.
SHA256 (model binary):a2d401be3b4ea07054af54352de5fdb49974c6dd27ef05267b14b44e11cb91fe
- Downloads last month
- 3