--- language: - bn tags: - whisper - automatic-speech-recognition - bengali license: apache-2.0 metrics: - wer pipeline_tag: automatic-speech-recognition --- # Whisper Small Bengali This is a fine-tuned Whisper Small model for Bengali (Bangla) speech recognition. ## Model Details - **Base Model**: openai/whisper-small - **Language**: Bengali (bn) - **Training Steps**: 2000 - **Final Training Loss**: N/A ## Usage ```python from transformers import WhisperProcessor, WhisperForConditionalGeneration, WhisperTokenizer import torch import librosa # Load model and tokenizer model = WhisperForConditionalGeneration.from_pretrained("Noobbbbb/whisper-small-bn") tokenizer = WhisperTokenizer.from_pretrained("Noobbbbb/whisper-small-bn") processor = WhisperProcessor.from_pretrained("Noobbbbb/whisper-small-bn") # Load audio (must be 16kHz) audio, sr = librosa.load("audio.wav", sr=16000) # Extract features input_features = processor.feature_extractor( audio, sampling_rate=16000, return_tensors="pt" ).input_features # Generate transcription with torch.no_grad(): generated_ids = model.generate(input_features, max_length=448) # Decode transcription = tokenizer.decode(generated_ids[0], skip_special_tokens=True) print(transcription) ``` ## Training Details - **Training Data**: openslr37 - **Language**: Bengali (bn) - **Training Steps**: 2000 - **Batch Size**: 4 - **Learning Rate**: 1e-05 - **Optimizer**: AdamW - **eval_wer**: 0.3080158337456705 ## Limitations - Optimized for Bengali speech only - Works best with clear audio at 16kHz sampling rate - May not perform well on heavily accented or noisy audio ## Acknowledgments Based on OpenAI's Whisper model: https://github.com/openai/whisper