Spaces:
Runtime error
Runtime error
| title: Urdu ASR SOTA | |
| emoji: 👨🎤 | |
| colorFrom: green | |
| colorTo: blue | |
| sdk: gradio | |
| app_file: Gradio/app.py | |
| pinned: true | |
| license: apache-2.0 | |
| duplicated_from: kingabzpro/Urdu-ASR-SOTA | |
| # Urdu Automatic Speech Recognition State of the Art Solution | |
|  | |
| Automatic Speech Recognition using Facebook's wav2vec2-xls-r-300m model and mozilla-foundation common_voice_8_0 Urdu Dataset. | |
| ## Model Finetunning | |
| This model is a fine-tuned version of [facebook/wav2vec2-xls-r-300m](https://huggingface.co/facebook/wav2vec2-xls-r-300m) on the [common_voice dataset](https://commonvoice.mozilla.org/en/datasets). | |
| It achieves the following results on the evaluation set: | |
| - Loss: 0.9889 | |
| - Wer: 0.5607 | |
| - Cer: 0.2370 | |
| ## Quick Prediction | |
| Install all dependecies using `requirment.txt` file and then run bellow command to predict the text: | |
| ```python | |
| import torch | |
| from datasets import load_dataset, Audio | |
| from transformers import pipeline | |
| model = "Model" | |
| data = load_dataset("Data", "ur", split="test", delimiter="\t") | |
| def path_adjust(batch): | |
| batch["path"] = "Data/ur/clips/" + str(batch["path"]) | |
| return batch | |
| data = data.map(path_adjust) | |
| sample_iter = iter(data.cast_column("path", Audio(sampling_rate=16_000))) | |
| sample = next(sample_iter) | |
| asr = pipeline("automatic-speech-recognition", model=model) | |
| prediction = asr( | |
| sample["path"]["array"], chunk_length_s=5, stride_length_s=1) | |
| prediction | |
| # => {'text': 'اب یہ ونگین لمحاتانکھار دلمیں میںفوث کریلیا اجائ'} | |
| ``` | |
| ## Evaluation Commands | |
| To evaluate on `mozilla-foundation/common_voice_8_0` with split `test`, you can copy and past the command to the terminal. | |
| ```bash | |
| python3 eval.py --model_id Model --dataset Data --config ur --split test --chunk_length_s 5.0 --stride_length_s 1.0 --log_outputs | |
| ``` | |
| **OR** | |
| Run the simple shell script | |
| ```bash | |
| bash run_eval.sh | |
| ``` | |
| ## Language Model | |
| [Boosting Wav2Vec2 with n-grams in 🤗 Transformers](https://huggingface.co/blog/wav2vec2-with-ngram) | |
| - Get suitable Urdu text data for a language model | |
| - Build an n-gram with KenLM | |
| - Combine the n-gram with a fine-tuned Wav2Vec2 checkpoint | |
| Install kenlm and pyctcdecode before running the notebook. | |
| ```bash | |
| pip install https://github.com/kpu/kenlm/archive/master.zip pyctcdecode | |
| ``` | |
| ## Eval Results | |
| | Without LM | With LM | | |
| | ---------- | ------- | | |
| | 56.21 | 46.37 | | |
| ## Directory Structure | |
| ``` | |
| <root directory> | |
| | | |
| .- README.md | |
| | | |
| .- Data/ | |
| | | |
| .- Model/ | |
| | | |
| .- Images/ | |
| | | |
| .- Sample/ | |
| | | |
| .- Gradio/ | |
| | | |
| .- Eval Results/ | |
| | | |
| .- With LM/ | |
| | | |
| .- Without LM/ | |
| | ... | |
| .- notebook.ipynb | |
| | | |
| .- run_eval.sh | |
| | | |
| .- eval.py | |
| ``` | |
| ## Gradio App | |
| ## SOTA | |
| - [x] Add Language Model | |
| - [x] Webapp/API | |
| - [] Denoise Audio | |
| - [] Text Processing | |
| - [] Spelling Mistakes | |
| - [x] Hyperparameters optimization | |
| - [] Training on 300 Epochs & 64 Batch Size | |
| - [] Improved Language Model | |
| - [] Contribute to Urdu ASR Audio Dataset | |
| ## Robust Speech Recognition Challenge 2022 | |
| This project was the results of HuggingFace [Robust Speech Recognition Challenge](https://discuss.huggingface.co/t/open-to-the-community-robust-speech-recognition-challenge/13614). I was one of the winner with four state of the art ASR model. Check out my SOTA checkpoints. | |
| - **[Urdu](https://huggingface.co/kingabzpro/wav2vec2-large-xls-r-300m-Urdu)** | |
| - **[Arabic](https://huggingface.co/kingabzpro/wav2vec2-large-xlsr-300-arabic)** | |
| - **[Punjabi](https://huggingface.co/kingabzpro/wav2vec2-large-xlsr-53-punjabi)** | |
| - **[Irish](https://huggingface.co/kingabzpro/wav2vec2-large-xls-r-1b-Irish)** | |
|  | |
| ## References | |
| - [Common Voice Dataset](https://commonvoice.mozilla.org/en/datasets) | |
| - [Sequence Modeling With CTC](https://distill.pub/2017/ctc/) | |
| - [Fine-tuning XLS-R for Multi-Lingual ASR with 🤗 Transformers](https://huggingface.co/blog/fine-tune-xlsr-wav2vec2) | |
| - [Boosting Wav2Vec2 with n-grams in 🤗 Transformers](https://huggingface.co/blog/wav2vec2-with-ngram) | |
| - [HF Model](https://huggingface.co/kingabzpro/wav2vec2-large-xls-r-300m-Urdu) |