|
|
--- |
|
|
language: |
|
|
- bn |
|
|
tags: |
|
|
- whisper |
|
|
- automatic-speech-recognition |
|
|
- bengali |
|
|
license: apache-2.0 |
|
|
metrics: |
|
|
- wer |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
|
|
|
# Whisper Small Bengali |
|
|
|
|
|
This is a fine-tuned Whisper Small model for Bengali (Bangla) speech recognition. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: openai/whisper-small |
|
|
- **Language**: Bengali (bn) |
|
|
- **Training Steps**: 2000 |
|
|
- **Final Training Loss**: N/A |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
import torch |
|
|
from transformers import pipeline |
|
|
|
|
|
# choose device |
|
|
device = "cuda:0" if torch.cuda.is_available() else "cpu" |
|
|
|
|
|
# create pipeline |
|
|
asr = pipeline( |
|
|
"automatic-speech-recognition", |
|
|
model="vivasoft/whisper-small-bn", |
|
|
chunk_length_s=30, |
|
|
device=device |
|
|
) |
|
|
|
|
|
asr.model.config.forced_decoder_ids = asr.tokenizer.get_decoder_prompt_ids( |
|
|
language="bn", |
|
|
task="transcribe" |
|
|
) |
|
|
|
|
|
# load your audio file path (must be compatible, e.g., WAV/MP3) |
|
|
audio_file = "/content/yt-3.mp3" |
|
|
# run transcription |
|
|
result = asr(audio_file) |
|
|
print("Transcription:", result["text"]) |
|
|
|
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Training Data**: openslr37 |
|
|
- **Language**: Bengali (bn) |
|
|
- **Training Steps**: 2000 |
|
|
- **Batch Size**: 4 |
|
|
- **Learning Rate**: 1e-05 |
|
|
- **Optimizer**: AdamW |
|
|
- **eval_wer**: 0.3080158337456705 |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized for Bengali speech only |
|
|
- Works best with clear audio at 16kHz sampling rate |
|
|
- May not perform well on heavily accented or noisy audio |
|
|
|
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Based on OpenAI's Whisper model: https://github.com/openai/whisper |
|
|
|