| | --- |
| | library_name: transformers |
| | license: apache-2.0 |
| | base_model: openai/whisper-tiny |
| | tags: |
| | - automatic-speech-recognition |
| | - whisper |
| | - urdu |
| | datasets: |
| | - mozilla-foundation/common_voice_17_0 |
| | - HowMannyMore/urdu-audiodataset |
| | metrics: |
| | - wer |
| | - cer |
| | - bleu |
| | - chrf |
| | model-index: |
| | - name: whisper-tiny-urdu |
| | results: |
| | - task: |
| | type: automatic-speech-recognition |
| | name: Automatic Speech Recognition |
| | dataset: |
| | name: Common Voice 17.0 (Urdu) |
| | type: mozilla-foundation/common_voice_17_0 |
| | config: ur |
| | split: test |
| | args: ur |
| | metrics: |
| | - name: WER on Common Voice 17.0 |
| | type: wer |
| | value: 46.908 |
| | - name: CER on Common Voice 17.0 |
| | type: cer |
| | value: 18.543 |
| | - name: BLEU on Common Voice 17.0 |
| | type: bleu |
| | value: 32.631 |
| | - name: ChrF on Common Voice 17.0 |
| | type: chrf |
| | value: 63.988 |
| | language: |
| | - ur |
| | pipeline_tag: automatic-speech-recognition |
| | --- |
| | |
| | <!-- This model card has been generated automatically according to the information the Trainer had access to. You |
| | should probably proofread and complete it, then remove this comment. --> |
| |
|
| | # whisper-tiny-urdu |
| |
|
| | This model is a fine-tuned version of [openai/whisper-tiny](https://huggingface.co/openai/whisper-tiny) on the common_voice_17_0 dataset. |
| | It achieves the following results on the evaluation set: |
| | - Loss: 0.7225 |
| | - Wer: 47.8529 |
| | |
| | |
| | ## Quick Usage |
| | |
| | ```python |
| | from transformers import pipeline |
| | |
| | transcriber = pipeline( |
| | "automatic-speech-recognition", |
| | model="kingabzpro/whisper-tiny-urdu" |
| | ) |
| | |
| | transcriber.model.generation_config.forced_decoder_ids = None |
| | transcriber.model.generation_config.language = "ur" |
| | |
| | transcription = transcriber("audio2.mp3") |
| | print(transcription) |
| | ``` |
| | |
| | ```sh |
| | {'text': 'دیکھیے پانی کب تک بہتا اور مچھلی کب تک تیرتی ہے'} |
| | ``` |
| | |
| | ## Evaluation |
| | |
| | |
| | | **Dataset** | **WER (%)** | **CER (%)** | **BLEU** | **ChrF** | |
| | | ------------------------------ | ----------- | ----------- | -------- | -------- | |
| | | Common Voice 17.0 (Urdu) | 46.908 | 18.543 | 32.631 | 63.988 | |
| | | HowMannyMore/urdu-audiodataset | 51.405 | 21.830 | 31.475 | 64.204 | |
| | |
| | |
| | |
| | ## Training procedure |
| | |
| | ### Training hyperparameters |
| | |
| | The following hyperparameters were used during training: |
| | - learning_rate: 2e-05 |
| | - train_batch_size: 32 |
| | - eval_batch_size: 32 |
| | - seed: 42 |
| | - optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments |
| | - lr_scheduler_type: cosine |
| | - lr_scheduler_warmup_steps: 200 |
| | - training_steps: 2500 |
| | - mixed_precision_training: Native AMP |
| |
|
| | ### Training results |
| |
|
| | | Training Loss | Epoch | Step | Validation Loss | Wer | |
| | |:-------------:|:------:|:----:|:---------------:|:-------:| |
| | | 0.6808 | 1.6949 | 500 | 0.7403 | 52.6699 | |
| | | 0.3948 | 3.3898 | 1000 | 0.6850 | 47.1247 | |
| | | 0.2873 | 5.0847 | 1500 | 0.6994 | 48.1516 | |
| | | 0.2024 | 6.7797 | 2000 | 0.7169 | 46.7326 | |
| | | 0.183 | 8.4746 | 2500 | 0.7225 | 47.8529 | |
| |
|
| |
|
| | ### Framework versions |
| |
|
| | - Transformers 4.51.3 |
| | - Pytorch 2.6.0+cu124 |
| | - Datasets 3.6.0 |
| | - Tokenizers 0.21.1 |
| |
|