You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This is an fp32 version of the vhdm/whisper-large-fa-v1 model, fine-tuned on the FaVoice dataset, compiled for Whisper-X/CTranslate2.

Fine-tuning details: the full model was fine-tuned from vhdm/whisper-large-fa-v1 on Byne/farsi-train-set-custom-audios, with the Byne/farsi-molfar-val hard set blended into the training data. Hyper-parameters: 15 epochs, learning rate 3e-6 (cosine, 5% warmup), batch size 16, bf16, SpecAugment 0.095, label smoothing 0.12, seed 42. Word Error Rate is ~34% on a held-out 10% split of the custom-audios set (benchmark-normalised scoring).

Important: when copying the model, ensure that you copy all files in the repo. Missing files can lead to a silent failure.

SHA256 (model binary):a2d401be3b4ea07054af54352de5fdb49974c6dd27ef05267b14b44e11cb91fe

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support