smcproject
/

vegam-whisper-medium-ml

Automatic Speech Recognition

Model card Files Files and versions

vegam-whisper-medium-ml / README.md

kurianbenoy's picture

Update README.md

2133419 almost 3 years ago

|

history blame contribute delete

3.51 kB

	---
	language:
	- ml
	tags:
	- audio
	- automatic-speech-recognition
	- vegam
	license: mit
	datasets:
	- google/fleurs
	- thennal/IMaSC
	- mozilla-foundation/common_voice_11_0
	library_name: ctranslate2
	---

	> Note: Model file size is 3.06 GB

	# vegam-whipser-medium-ml (വേഗം)

	This is a conversion of [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format.

	This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/guillaumekln/faster-whisper).

	## Installation

	- Install [faster-whisper](https://github.com/guillaumekln/faster-whisper). More details about installation can be [found here in faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master#installation).

	```
	pip install faster-whisper
	```

	- Install [git-lfs](https://git-lfs.com/) for using this project. [Other approaches for downloading git-lfs in non-debian based systems](https://github.com/git-lfs/git-lfs?utm_source=gitlfs_site&utm_medium=installation_link&utm_campaign=gitlfs#installing).

	Note that git-lfs is just for downloading model from hugging-face.

	```
	apt-get install git-lfs
	```

	- Download the model weights

	```
	git lfs install
	git clone https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml
	```

	## Usage

	```
	from faster_whisper import WhisperModel

	model_path = "vegam-whisper-medium-ml"

	# Run on GPU with FP16
	model = WhisperModel(model_path, device="cuda", compute_type="float16")

	# or run on GPU with INT8
	# model = WhisperModel(model_path, device="cuda", compute_type="int8_float16")
	# or run on CPU with INT8
	# model = WhisperModel(model_path, device="cpu", compute_type="int8")

	segments, info = model.transcribe("audio.mp3", beam_size=5)

	print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

	for segment in segments:
	print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
	```

	## Example

	```
	from faster_whisper import WhisperModel

	model_path = "vegam-whisper-medium-ml"

	model = WhisperModel(model_path, device="cuda", compute_type="float16")


	segments, info = model.transcribe("00b38e80-80b8-4f70-babf-566e848879fc.webm", beam_size=5)

	print("Detected language '%s' with probability %f" % (info.language, info.language_probability))

	for segment in segments:
	print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text))
	```

	> Detected language 'ta' with probability 0.353516

	> [0.00s -> 4.74s] പാലം കടുക്കുവോളം നാരായണ പാലം കടന്നാലൊ കൂരായണ

	Note: The audio file [00b38e80-80b8-4f70-babf-566e848879fc.webm](https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml/blob/main/00b38e80-80b8-4f70-babf-566e848879fc.webm) is from [Malayalam Speech Corpus](https://blog.smc.org.in/malayalam-speech-corpus/) and is stored along with model weights.
	## Conversion Details

	This conversion was possible with wonderful [CTranslate2 library](https://github.com/OpenNMT/CTranslate2) leveraging the [Transformers converter for OpenAI Whisper](https://opennmt.net/CTranslate2/guides/transformers.html#whisper).The original model was converted with the following command:

	```
	ct2-transformers-converter --model thennal/whisper-medium-ml --output_dir vegam-whisper-medium-ml
	```

	## Many Thanks to

	- Creators of CTranslate2 and faster-whisper
	- Thennal D K
	- Santhosh Thottingal