๐ค Whisper Bengali ASR
This model is a fine-tuned version of OpenAI Whisper Small for Bengali Automatic Speech Recognition (ASR).
๐ Model Details
- Base Model:
openai/whisper-small - Language: Bengali (bn)
- Task: Speech-to-Text
- Framework: PyTorch
- Training: Fine-tuned on synthetic Bengali speech dataset
- Hardware: Kaggle GPU
๐ Training Info
- Epochs: 3
- Batch size: 4
- Optimizer: AdamW
- Learning rate: 1e-4
- Mixed precision: No
- Gradient checkpointing: Disabled (for stability)
๐ Dataset
This model was trained on a synthetic Bengali dataset generated using sine-wave audio and Bengali text samples.
โ ๏ธ This dataset is for demonstration purposes only โ not real speech.
๐ Usage
from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch, soundfile as sf
model = WhisperForConditionalGeneration.from_pretrained("Tmanna/whisper-bengali-final")
processor = WhisperProcessor.from_pretrained("Tmanna/whisper-bengali-final")
audio, sr = sf.read("audio.wav")
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").input_features
with torch.no_grad():
ids = model.generate(inputs)
print(processor.batch_decode(ids, skip_special_tokens=True)[0])
๐ Limitations
- Trained on synthetic audio (not real speech)
- Accuracy is limited
- Not production-ready
- For research/demo use
๐ฎ Future Work
- Train on real Bengali speech dataset
- Use Whisper Medium / Large
- Add speaker diarization
- Improve WER accuracy
๐ค Author
Tapan Manna
Machine Learning / Speech AI
โญ If you like this model, consider starring the repo!
- Downloads last month
- 23