numan98
/

whisper-tiny-nextayah-v2

+---
+library_name: transformers
+language:
+- ar
+license: apache-2.0
+base_model: tarteel-ai/whisper-tiny-ar-quran
+tags:
+- generated_from_trainer
+datasets:
+- numan98/synth-incorrect-verses
+metrics:
+- wer
+model-index:
+- name: Nextayah Tiny Whisper Finetuned
+  results:
+  - task:
+      name: Automatic Speech Recognition
+      type: automatic-speech-recognition
+    dataset:
+      name: Synthetic Incorrect Verses
+      type: numan98/synth-incorrect-verses
+      config: default
+      split: None
+      args: 'split: test'
+    metrics:
+    - name: Wer
+      type: wer
+      value: 22.098421541318476
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# Nextayah Tiny Whisper Finetuned
+This model is a fine-tuned version of [tarteel-ai/whisper-tiny-ar-quran](https://huggingface.co/tarteel-ai/whisper-tiny-ar-quran) on the Synthetic Incorrect Verses dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.1079
+- Wer: 22.0984
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 5e-06
+- train_batch_size: 8
+- eval_batch_size: 8
+- seed: 42
+- gradient_accumulation_steps: 2
+- total_train_batch_size: 16
+- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_steps: 500
+- training_steps: 1500
+- mixed_precision_training: Native AMP
+### Training results
+| Training Loss | Epoch   | Step | Validation Loss | Wer     |
+|:-------------:|:-------:|:----:|:---------------:|:-------:|
+| 0.0666        | 8.7788  | 500  | 0.1351          | 27.3909 |
+| 0.0073        | 17.5487 | 1000 | 0.1090          | 23.3983 |
+| 0.0029        | 26.3186 | 1500 | 0.1079          | 22.0984 |
+### Framework versions
+- Transformers 4.48.0
+- Pytorch 2.5.1+cu121
+- Datasets 3.2.0
+- Tokenizers 0.21.0

generation_config.json ADDED Viewed

	@@ -0,0 +1,127 @@

+{
+  "begin_suppress_tokens": [
+    220,
+    50257
+  ],
+  "bos_token_id": 50257,
+  "decoder_start_token_id": 50258,
+  "eos_token_id": 50257,
+  "lang_to_id": {
+    "afrikaans": 68,
+    "albanian": 58,
+    "amharic": 75,
+    "arabic": 13,
+    "armenian": 53,
+    "assamese": 91,
+    "azerbaijani": 45,
+    "bashkir": 96,
+    "basque": 51,
+    "belarusian": 71,
+    "bengali": 43,
+    "bosnian": 56,
+    "breton": 50,
+    "bulgarian": 33,
+    "burmese": 100,
+    "cantonese": 99,
+    "castilian": 110,
+    "catalan": 11,
+    "chinese": 1,
+    "croatian": 32,
+    "czech": 24,
+    "danish": 26,
+    "dutch": 12,
+    "english": 0,
+    "estonian": 48,
+    "faroese": 79,
+    "finnish": 18,
+    "flemish": 102,
+    "french": 6,
+    "galician": 60,
+    "georgian": 70,
+    "german": 2,
+    "greek": 22,
+    "gujarati": 74,
+    "haitian": 103,
+    "haitian creole": 80,
+    "hausa": 95,
+    "hawaiian": 93,
+    "hebrew": 20,
+    "hindi": 17,
+    "hungarian": 27,
+    "icelandic": 52,
+    "indonesian": 16,
+    "italian": 15,
+    "japanese": 7,
+    "javanese": 97,
+    "kannada": 47,
+    "kazakh": 57,
+    "khmer": 64,
+    "korean": 5,
+    "lao": 77,
+    "latin": 35,
+    "latvian": 42,
+    "letzeburgesch": 104,
+    "lingala": 94,
+    "lithuanian": 34,
+    "luxembourgish": 86,
+    "macedonian": 49,
+    "malagasy": 90,
+    "malay": 23,
+    "malayalam": 37,
+    "maltese": 84,
+    "mandarin": 111,
+    "maori": 36,
+    "marathi": 61,
+    "moldavian": 107,
+    "moldovan": 108,
+    "mongolian": 55,
+    "myanmar": 87,
+    "nepali": 54,
+    "norwegian": 29,
+    "nynorsk": 83,
+    "occitan": 69,
+    "panjabi": 106,
+    "pashto": 81,
+    "persian": 41,
+    "polish": 10,
+    "portuguese": 8,
+    "punjabi": 62,
+    "pushto": 105,
+    "romanian": 25,
+    "russian": 4,
+    "sanskrit": 85,
+    "serbian": 44,
+    "shona": 65,
+    "sindhi": 73,
+    "sinhala": 63,
+    "sinhalese": 109,
+    "slovak": 39,
+    "slovenian": 46,
+    "somali": 67,
+    "spanish": 3,
+    "sundanese": 98,
+    "swahili": 59,
+    "swedish": 14,
+    "tagalog": 89,
+    "tajik": 72,
+    "tamil": 28,
+    "tatar": 92,
+    "telugu": 40,
+    "thai": 30,
+    "tibetan": 88,
+    "turkish": 9,
+    "turkmen": 82,
+    "ukrainian": 21,
+    "urdu": 31,
+    "uzbek": 78,
+    "valencian": 101,
+    "vietnamese": 19,
+    "welsh": 38,
+    "yiddish": 76,
+    "yoruba": 66
+  },
+  "max_length": 448,
+  "pad_token_id": 50257,
+  "transformers_version": "4.48.0",
+  "use_cache": false
+}