--- language: - bn tags: - whisper - automatic-speech-recognition - bengali license: apache-2.0 metrics: - wer pipeline_tag: automatic-speech-recognition --- # Whisper Small Bengali This is a fine-tuned Whisper Small model for Bengali (Bangla) speech recognition. ## Model Details - **Base Model**: openai/whisper-small - **Language**: Bengali (bn) - **Training Steps**: 2000 - **Final Training Loss**: N/A ## Usage ```python import torch from transformers import pipeline # choose device device = "cuda:0" if torch.cuda.is_available() else "cpu" # create pipeline asr = pipeline( "automatic-speech-recognition", model="vivasoft/whisper-small-bn", chunk_length_s=30, device=device ) asr.model.config.forced_decoder_ids = asr.tokenizer.get_decoder_prompt_ids( language="bn", task="transcribe" ) # load your audio file path (must be compatible, e.g., WAV/MP3) audio_file = "/content/yt-3.mp3" # run transcription result = asr(audio_file) print("Transcription:", result["text"]) ``` ## Training Details - **Training Data**: openslr37 - **Language**: Bengali (bn) - **Training Steps**: 2000 - **Batch Size**: 4 - **Learning Rate**: 1e-05 - **Optimizer**: AdamW - **eval_wer**: 0.3080158337456705 ## Limitations - Optimized for Bengali speech only - Works best with clear audio at 16kHz sampling rate - May not perform well on heavily accented or noisy audio ## Acknowledgments Based on OpenAI's Whisper model: https://github.com/openai/whisper