🎤 Whisper Bengali ASR

This model is a fine-tuned version of OpenAI Whisper Small for Bengali Automatic Speech Recognition (ASR).

📌 Model Details

Base Model: openai/whisper-small
Language: Bengali (bn)
Task: Speech-to-Text
Framework: PyTorch
Training: Fine-tuned on synthetic Bengali speech dataset
Hardware: Kaggle GPU

📊 Training Info

Epochs: 3
Batch size: 4
Optimizer: AdamW
Learning rate: 1e-4
Mixed precision: No
Gradient checkpointing: Disabled (for stability)

📂 Dataset

This model was trained on a synthetic Bengali dataset generated using sine-wave audio and Bengali text samples.

⚠️ This dataset is for demonstration purposes only — not real speech.

🚀 Usage

from transformers import WhisperProcessor, WhisperForConditionalGeneration
import torch, soundfile as sf

model = WhisperForConditionalGeneration.from_pretrained("Tmanna/whisper-bengali-final")
processor = WhisperProcessor.from_pretrained("Tmanna/whisper-bengali-final")

audio, sr = sf.read("audio.wav")
inputs = processor(audio, sampling_rate=16000, return_tensors="pt").input_features

with torch.no_grad():
    ids = model.generate(inputs)

print(processor.batch_decode(ids, skip_special_tokens=True)[0])

📈 Limitations

Trained on synthetic audio (not real speech)
Accuracy is limited
Not production-ready
For research/demo use

🔮 Future Work

Train on real Bengali speech dataset
Use Whisper Medium / Large
Add speaker diarization
Improve WER accuracy

👤 Author

Tapan Manna
Machine Learning / Speech AI

⭐ If you like this model, consider starring the repo!

Downloads last month: 2

Safetensors

Model size

0.2B params

Tensor type

F32