| --- |
| license: mit |
| language: |
| - kbd |
| datasets: |
| - anzorq/kbd_speech |
| - anzorq/sixuxar_yijiri_mak7 |
| metrics: |
| - wer |
| pipeline_tag: automatic-speech-recognition |
| --- |
| # Circassian (Kabardian) ASR Model |
|
|
| This is a fine-tuned model for Automatic Speech Recognition (ASR) in `kbd`, based on the `facebook/w2v-bert-2.0` model. |
|
|
| The model was trained on a combination of the `anzorq/kbd_speech` (filtered on `country=russia`) and `anzorq/sixuxar_yijiri_mak7` datasets. |
|
|
| ## Model Details |
|
|
| - **Base Model**: facebook/w2v-bert-2.0 |
| - **Language**: Kabardian |
| - **Task**: Automatic Speech Recognition (ASR) |
| - **Datasets**: anzorq/kbd_speech, anzorq/sixuxar_yijiri_mak7 |
| - **Training Steps**: 5000 |
| |
| ## Training |
| |
| The model was fine-tuned using the following training arguments: |
| |
| ```python |
| TrainingArguments( |
| output_dir='output', |
| group_by_length=True, |
| per_device_train_batch_size=8, |
| gradient_accumulation_steps=2, |
| evaluation_strategy="steps", |
| num_train_epochs=10, |
| gradient_checkpointing=True, |
| fp16=True, |
| save_steps=1000, |
| eval_steps=500, |
| logging_steps=300, |
| learning_rate=5e-5, |
| warmup_steps=500, |
| save_total_limit=2, |
| push_to_hub=True, |
| report_to="wandb" |
| ) |
| ``` |
| |
| ## Performance |
| |
| The model's performance during training: |
| |
| | Step | Training Loss | Validation Loss | WER | |
| |------|---------------|-----------------|---------| |
| | 500 | 2.859600 | inf | 0.870362| |
| | 1000 | 0.355500 | inf | 0.703617| |
| | 1500 | 0.247100 | inf | 0.549942| |
| | 2000 | 0.196700 | inf | 0.471762| |
| | 2500 | 0.181500 | inf | 0.361494| |
| | 3000 | 0.152200 | inf | 0.314119| |
| | 3500 | 0.135700 | inf | 0.275146| |
| | 4000 | 0.113400 | inf | 0.252625| |
| | 4500 | 0.102900 | inf | 0.277013| |
| | 5000 | 0.078500 | inf | 0.250175| |