๐ŸŽค Whisper Bengali ASR

This model is a fine-tuned version of OpenAI Whisper Small for Bengali Automatic Speech Recognition (ASR).

๐Ÿ“Œ Model Details

  • Base Model: openai/whisper-small
  • Language: Bengali (bn)
  • Task: Speech-to-Text
  • Framework: PyTorch
  • Training: Fine-tuned on synthetic Bengali speech dataset
  • Hardware: Kaggle GPU

๐Ÿ“Š Training Info

  • Epochs: 3
  • Batch size: 4
  • Optimizer: AdamW
  • Learning rate: 1e-4
  • Mixed precision: No
  • Gradient checkpointing: Disabled (for stability)

๐Ÿ“‚ Dataset

This model was trained on a synthetic Bengali dataset generated using sine-wave audio and Bengali text samples.

โš ๏ธ This dataset is for demonstration purposes only โ€” not real speech.

๐Ÿš€ Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch, soundfile as sf

model = WhisperForConditionalGeneration.from_pretrained("Tmanna/whisper-bengali-final")
processor = WhisperProcessor.from_pretrained("Tmanna/whisper-bengali-final")

audio, sr = sf.read("audio.wav")
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

with torch.no_grad():
    ids = model.generate(inputs)

print(processor.batch_decode(ids, skip_special_tokens=True)[0])

๐Ÿ“ˆ Limitations

  • Trained on synthetic audio (not real speech)
  • Accuracy is limited
  • Not production-ready
  • For research/demo use

๐Ÿ”ฎ Future Work

  • Train on real Bengali speech dataset
  • Use Whisper Medium / Large
  • Add speaker diarization
  • Improve WER accuracy

๐Ÿ‘ค Author

Tapan Manna
Machine Learning / Speech AI


โญ If you like this model, consider starring the repo!

Downloads last month
2
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support