Whisper Small Bengali

This is a fine-tuned Whisper Small model for Bengali (Bangla) speech recognition.

Model Details

Base Model: openai/whisper-small
Language: Bengali (bn)
Training Steps: 2000
Final Training Loss: N/A

Usage

import torch
from transformers import pipeline

# choose device
device = "cuda:0" if torch.cuda.is_available() else "cpu"

# create pipeline
asr = pipeline(
    "automatic-speech-recognition",
    model="vivasoft/whisper-small-bn",
    chunk_length_s=30,
    device=device
)

asr.model.config.forced_decoder_ids = asr.tokenizer.get_decoder_prompt_ids(
    language="bn",
    task="transcribe"
)

# load your audio file path (must be compatible, e.g., WAV/MP3)
audio_file = "/content/yt-3.mp3"
# run transcription
result = asr(audio_file)
print("Transcription:", result["text"])

Training Details

Training Data: openslr37
Language: Bengali (bn)
Training Steps: 2000
Batch Size: 4
Learning Rate: 1e-05
Optimizer: AdamW
eval_wer: 0.3080158337456705

Limitations

Optimized for Bengali speech only
Works best with clear audio at 16kHz sampling rate
May not perform well on heavily accented or noisy audio

Acknowledgments

Based on OpenAI's Whisper model: https://github.com/openai/whisper

Downloads last month: 4

Safetensors

Model size

0.2B params

Tensor type

F32