vivasoft
/

whisper-small-bn

Automatic Speech Recognition

Model card Files Files and versions

whisper-small-bn / README.md

Evan888sdklfadklshfdjklsuf's picture

Evan888sdklfadklshfdjklsuf

Update README.md

2356a93 verified about 2 months ago

|

history blame contribute delete

1.48 kB

	---
	language:
	- bn
	tags:
	- whisper
	- automatic-speech-recognition
	- bengali
	license: apache-2.0
	metrics:
	- wer
	pipeline_tag: automatic-speech-recognition
	---

	# Whisper Small Bengali

	This is a fine-tuned Whisper Small model for Bengali (Bangla) speech recognition.

	## Model Details

	- Base Model: openai/whisper-small
	- Language: Bengali (bn)
	- Training Steps: 2000
	- Final Training Loss: N/A

	## Usage

	```python
	import torch
	from transformers import pipeline

	# choose device
	device = "cuda:0" if torch.cuda.is_available() else "cpu"

	# create pipeline
	asr = pipeline(
	"automatic-speech-recognition",
	model="vivasoft/whisper-small-bn",
	chunk_length_s=30,
	device=device
	)

	asr.model.config.forced_decoder_ids = asr.tokenizer.get_decoder_prompt_ids(
	language="bn",
	task="transcribe"
	)

	# load your audio file path (must be compatible, e.g., WAV/MP3)
	audio_file = "/content/yt-3.mp3"
	# run transcription
	result = asr(audio_file)
	print("Transcription:", result["text"])

	```

	## Training Details

	- Training Data: openslr37
	- Language: Bengali (bn)
	- Training Steps: 2000
	- Batch Size: 4
	- Learning Rate: 1e-05
	- Optimizer: AdamW
	- eval_wer: 0.3080158337456705

	## Limitations

	- Optimized for Bengali speech only
	- Works best with clear audio at 16kHz sampling rate
	- May not perform well on heavily accented or noisy audio


	## Acknowledgments

	Based on OpenAI's Whisper model: https://github.com/openai/whisper