Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,80 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Model Card for WhisperLiveSubs
|
| 2 |
+
This model is a fine-tuned version of OpenAI's Whisper model on the Common Voice dataset for Urdu speech recognition. It is optimized for transcribing Urdu language audio.
|
| 3 |
+
|
| 4 |
+
### Model Description
|
| 5 |
+
This model is a small variant of the Whisper model fine-tuned on the Common Voice dataset for the Urdu language. It is intended for automatic speech recognition (ASR) tasks and performs well in transcribing Urdu speech.
|
| 6 |
+
- **Developed by:** codewithdark
|
| 7 |
+
- **Model type:** Whisper-based model for ASR
|
| 8 |
+
- **Language(s) (NLP):** Urdu (ur)
|
| 9 |
+
- **License:** Apache 2.0
|
| 10 |
+
- **Finetuned from model :** openai/whisper-small
|
| 11 |
+
|
| 12 |
+
## Uses
|
| 13 |
+
### Direct Use
|
| 14 |
+
This model can be used directly for transcribing Urdu audio into text. It is suitable for applications such as:
|
| 15 |
+
- Voice-to-text transcription services
|
| 16 |
+
- Captioning Urdu language videos
|
| 17 |
+
- Speech analytics in Urdu
|
| 18 |
+
|
| 19 |
+
### Out-of-Scope Use
|
| 20 |
+
The model may not perform well for:
|
| 21 |
+
- Non-Urdu languages
|
| 22 |
+
- Extremely noisy environments
|
| 23 |
+
- Very long audio sequences without segmentation
|
| 24 |
+
|
| 25 |
+
## How to Get Started with the Model
|
| 26 |
+
Use the code below to get started with the model.
|
| 27 |
+
|
| 28 |
+
```python
|
| 29 |
+
from transformers import WhisperProcessor, WhisperForConditionalGeneration
|
| 30 |
+
|
| 31 |
+
processor = WhisperProcessor.from_pretrained("codewithdark/WhisperLiveSubs")
|
| 32 |
+
model = WhisperForConditionalGeneration.from_pretrained("codewithdark/WhisperLiveSubs")
|
| 33 |
+
|
| 34 |
+
# Your transcription code here
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
### Training Data
|
| 38 |
+
The model was fine-tuned on the Mozilla Common Voice dataset, specifically the Urdu subset. The dataset consists of approximately [number of hours] of transcribed Urdu speech.
|
| 39 |
+
|
| 40 |
+
#### Preprocessing
|
| 41 |
+
The audio was resampled to 16kHz, and text was tokenized using the Whisper tokenizer configured for Urdu.
|
| 42 |
+
|
| 43 |
+
#### Training Hyperparameters
|
| 44 |
+
- **Training regime:** Mixed precision (fp16)
|
| 45 |
+
- **Batch size:** 8
|
| 46 |
+
- **Gradient accumulation steps:** 2
|
| 47 |
+
- **Learning rate:** 1e-5
|
| 48 |
+
- **Max steps:** 4000
|
| 49 |
+
|
| 50 |
+
#### Metrics
|
| 51 |
+
Word Error Rate (WER) was the primary metric used to evaluate the model's performance.
|
| 52 |
+
|
| 53 |
+
### Results
|
| 54 |
+
|
| 55 |
+
- **Training Loss:** 0.2005
|
| 56 |
+
- **Validation Loss:** 0.5342
|
| 57 |
+
- **WER:** 51.06
|
| 58 |
+
|
| 59 |
+
*This is my first time fine-tuning this model. Don't worry about the current performance;
|
| 60 |
+
improvements can be made to enhance the model's accuracy and reduce the WER.*
|
| 61 |
+
|
| 62 |
+
- **Hardware Type:** P100 GPU
|
| 63 |
+
- **Hours used:** 10 hr
|
| 64 |
+
- **Cloud Provider:** Kaggle
|
| 65 |
+
- **Compute Region:** PK
|
| 66 |
+
|
| 67 |
+
### Model Architecture and Objective
|
| 68 |
+
The Whisper-UR-Small model is based on the Whisper architecture, designed for automatic speech recognition.
|
| 69 |
+
|
| 70 |
+
#### Software
|
| 71 |
+
- **Framework:** PyTorch
|
| 72 |
+
- **Transformers Version:**
|
| 73 |
+
|
| 74 |
+
#### Summary
|
| 75 |
+
The model demonstrates acceptable performance for Urdu transcription, but there is room for improvement in terms of WER, especially in noisy conditions or with diverse accents.
|
| 76 |
+
|
| 77 |
+
## Model Card Contact
|
| 78 |
+
For inquiries, please contact codewithdark90@gmail.com
|
| 79 |
+
|
| 80 |
+
@Codewithdark. (2024). WhisperLiveSubs: An Urdu Automatic Speech Recognition Model. Retrieved from https://huggingface.co/codewithdark/WhisperLiveSubs
|