Commit
·
fb730d7
1
Parent(s):
b3c405c
update readme
Browse files
README.md
CHANGED
|
@@ -11,4 +11,81 @@ widget:
|
|
| 11 |
- example_title: sample 3
|
| 12 |
src: https://huggingface.co/bangla-speech-processing/BanglaASR/resolve/main/mp3/common_voice_bn_31617644.mp3
|
| 13 |
pipeline_tag: automatic-speech-recognition
|
| 14 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
- example_title: sample 3
|
| 12 |
src: https://huggingface.co/bangla-speech-processing/BanglaASR/resolve/main/mp3/common_voice_bn_31617644.mp3
|
| 13 |
pipeline_tag: automatic-speech-recognition
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
Bangla ASR[Whisper BanglaASR] model which was trained Bangla Mozilla Common Voice Dataset. This is Fine-tuning Whisper for Bangla mozilla common voice dataset.
|
| 17 |
+
For training Bangla ASR model here used 40k traning and 7k Validation around 400 hours data. We trained 12000 steps this model and get word
|
| 18 |
+
error rate 4.58%.
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
```py
|
| 22 |
+
|
| 23 |
+
import os
|
| 24 |
+
import librosa
|
| 25 |
+
import torch
|
| 26 |
+
import torchaudio
|
| 27 |
+
import numpy as np
|
| 28 |
+
|
| 29 |
+
from transformers import WhisperTokenizer
|
| 30 |
+
from transformers import WhisperProcessor
|
| 31 |
+
from transformers import WhisperFeatureExtractor
|
| 32 |
+
from transformers import WhisperForConditionalGeneration
|
| 33 |
+
|
| 34 |
+
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
|
| 35 |
+
|
| 36 |
+
mp3_path = "https://huggingface.co/bangla-speech-processing/BanglaASR/resolve/main/mp3/common_voice_bn_31515636.mp3"
|
| 37 |
+
|
| 38 |
+
model_path = "bangla-speech-processing/BanglaASR"
|
| 39 |
+
|
| 40 |
+
|
| 41 |
+
feature_extractor = WhisperFeatureExtractor.from_pretrained(model_path)
|
| 42 |
+
tokenizer = WhisperTokenizer.from_pretrained(model_path)
|
| 43 |
+
processor = WhisperProcessor.from_pretrained(model_path)
|
| 44 |
+
model = WhisperForConditionalGeneration.from_pretrained(model_path).to(device)
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
speech_array, sampling_rate = torchaudio.load(mp3_path, format="mp3")
|
| 48 |
+
speech_array = speech_array[0].numpy()
|
| 49 |
+
speech_array = librosa.resample(np.asarray(speech_array), orig_sr=sampling_rate, target_sr=16000)
|
| 50 |
+
input_features = feature_extractor(speech_array, sampling_rate=16000, return_tensors="pt").input_features
|
| 51 |
+
|
| 52 |
+
# batch = processor.feature_extractor.pad(input_features, return_tensors="pt")
|
| 53 |
+
predicted_ids = model.generate(inputs=input_features.to(device))[0]
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
transcription = processor.decode(predicted_ids, skip_special_tokens=True)
|
| 57 |
+
|
| 58 |
+
print(transcription)
|
| 59 |
+
|
| 60 |
+
```
|
| 61 |
+
|
| 62 |
+
|
| 63 |
+
# Dataset
|
| 64 |
+
Use Mozilla common voice dataset. we used 400 hours data both training 40k and validation 7k mp3 samples.
|
| 65 |
+
For more information about dataser please [click here](https://commonvoice.mozilla.org/bn/datasets)
|
| 66 |
+
|
| 67 |
+
# Training Model Information
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
| Size | Layers | Width | Heads | Parameters | Bangla-only | Training Status |
|
| 71 |
+
| ------------- | ------------- | -------- |-------- | ------------- | ------------- | -------- |
|
| 72 |
+
tiny | 4 |384 | 6 | 39 M | X | X
|
| 73 |
+
base | 6 |512 | 8 |74 M | X | X
|
| 74 |
+
small | 12 |768 | 12 |244 M | ✓ | ✓
|
| 75 |
+
medium | 24 |1024 | 16 |769 M | X | X
|
| 76 |
+
large | 32 |1280 | 20 |1550 M | X | X
|
| 77 |
+
|
| 78 |
+
# Evaluation
|
| 79 |
+
|
| 80 |
+
Word Error Rate 4.58 %
|
| 81 |
+
|
| 82 |
+
For More please check the [github](https://github.com/saiful9379/BanglaASR/tree/main)
|
| 83 |
+
|
| 84 |
+
```
|
| 85 |
+
@misc{BanglaASR ,
|
| 86 |
+
title={Transformer Based Whisper Bangla ASR Model},
|
| 87 |
+
author={Md Saiful Islam},
|
| 88 |
+
howpublished={},
|
| 89 |
+
year={2023}
|
| 90 |
+
}
|
| 91 |
+
```
|