|
|
--- |
|
|
language: |
|
|
- bn |
|
|
tags: |
|
|
- whisper |
|
|
- automatic-speech-recognition |
|
|
- bengali |
|
|
license: apache-2.0 |
|
|
metrics: |
|
|
- wer |
|
|
pipeline_tag: automatic-speech-recognition |
|
|
--- |
|
|
|
|
|
# Whisper Small Bengali |
|
|
|
|
|
This is a fine-tuned Whisper Small model for Bengali (Bangla) speech recognition. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
- **Base Model**: openai/whisper-small |
|
|
- **Language**: Bengali (bn) |
|
|
- **Training Steps**: 2000 |
|
|
- **Final Training Loss**: N/A |
|
|
|
|
|
## Usage |
|
|
|
|
|
```python |
|
|
from transformers import WhisperProcessor, WhisperForConditionalGeneration, WhisperTokenizer |
|
|
import torch |
|
|
import librosa |
|
|
|
|
|
# Load model and tokenizer |
|
|
model = WhisperForConditionalGeneration.from_pretrained("Noobbbbb/whisper-small-bn") |
|
|
tokenizer = WhisperTokenizer.from_pretrained("Noobbbbb/whisper-small-bn") |
|
|
processor = WhisperProcessor.from_pretrained("Noobbbbb/whisper-small-bn") |
|
|
|
|
|
# Load audio (must be 16kHz) |
|
|
audio, sr = librosa.load("audio.wav", sr=16000) |
|
|
|
|
|
# Extract features |
|
|
input_features = processor.feature_extractor( |
|
|
audio, |
|
|
sampling_rate=16000, |
|
|
return_tensors="pt" |
|
|
).input_features |
|
|
|
|
|
# Generate transcription |
|
|
with torch.no_grad(): |
|
|
generated_ids = model.generate(input_features, max_length=448) |
|
|
|
|
|
# Decode |
|
|
transcription = tokenizer.decode(generated_ids[0], skip_special_tokens=True) |
|
|
print(transcription) |
|
|
``` |
|
|
|
|
|
## Training Details |
|
|
|
|
|
- **Training Data**: openslr37 |
|
|
- **Language**: Bengali (bn) |
|
|
- **Training Steps**: 2000 |
|
|
- **Batch Size**: 4 |
|
|
- **Learning Rate**: 1e-05 |
|
|
- **Optimizer**: AdamW |
|
|
- **eval_wer**: 0.3080158337456705 |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Optimized for Bengali speech only |
|
|
- Works best with clear audio at 16kHz sampling rate |
|
|
- May not perform well on heavily accented or noisy audio |
|
|
|
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
Based on OpenAI's Whisper model: https://github.com/openai/whisper |
|
|
|