๐ŸŽค Whisper Bengali ASR

This model is a fine-tuned version of OpenAI Whisper Small for Bengali Automatic Speech Recognition (ASR).

๐Ÿ“Œ Model Details

  • Base Model: openai/whisper-small
  • Language: Bengali (bn)
  • Task: Speech-to-Text
  • Framework: PyTorch
  • Training: Fine-tuned on synthetic Bengali speech dataset
  • Hardware: Kaggle GPU

๐Ÿ“Š Training Info

  • Epochs: 3
  • Batch size: 4
  • Optimizer: AdamW
  • Learning rate: 1e-4
  • Mixed precision: No
  • Gradient checkpointing: Disabled (for stability)

๐Ÿ“‚ Dataset

This model was trained on a synthetic Bengali dataset generated using sine-wave audio and Bengali text samples.

โš ๏ธ This dataset is for demonstration purposes only โ€” not real speech.

๐Ÿš€ Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch, soundfile as sf

model = WhisperForConditionalGeneration.from_pretrained("Tmanna/whisper-bengali-final")
processor = WhisperProcessor.from_pretrained("Tmanna/whisper-bengali-final")

audio, sr = sf.read("audio.wav")
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

with torch.no_grad():
    ids = model.generate(inputs)

print(processor.batch_decode(ids, skip_special_tokens=True)[0])

๐Ÿ“ˆ Limitations

  • Trained on synthetic audio (not real speech)
  • Accuracy is limited
  • Not production-ready
  • For research/demo use

๐Ÿ”ฎ Future Work

  • Train on real Bengali speech dataset
  • Use Whisper Medium / Large
  • Add speaker diarization
  • Improve WER accuracy

๐Ÿ‘ค Author

Tapan Manna
Machine Learning / Speech AI


โญ If you like this model, consider starring the repo!

Downloads last month
23
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support