| --- |
| language: |
| - ml |
| tags: |
| - audio |
| - automatic-speech-recognition |
| - vegam |
| license: mit |
| datasets: |
| - google/fleurs |
| - thennal/IMaSC |
| - mozilla-foundation/common_voice_11_0 |
| library_name: ctranslate2 |
| --- |
| |
| > Note: Model file size is 3.06 GB |
|
|
| # vegam-whipser-medium-ml (വേഗം) |
|
|
| This is a conversion of [thennal/whisper-medium-ml](https://huggingface.co/thennal/whisper-medium-ml) to the [CTranslate2](https://github.com/OpenNMT/CTranslate2) model format. |
|
|
| This model can be used in CTranslate2 or projects based on CTranslate2 such as [faster-whisper](https://github.com/guillaumekln/faster-whisper). |
|
|
| ## Installation |
|
|
| - Install [faster-whisper](https://github.com/guillaumekln/faster-whisper). More details about installation can be [found here in faster-whisper](https://github.com/guillaumekln/faster-whisper/tree/master#installation). |
|
|
| ``` |
| pip install faster-whisper |
| ``` |
|
|
| - Install [git-lfs](https://git-lfs.com/) for using this project. [Other approaches for downloading git-lfs in non-debian based systems](https://github.com/git-lfs/git-lfs?utm_source=gitlfs_site&utm_medium=installation_link&utm_campaign=gitlfs#installing). |
|
|
| Note that git-lfs is just for downloading model from hugging-face. |
|
|
| ``` |
| apt-get install git-lfs |
| ``` |
|
|
| - Download the model weights |
|
|
| ``` |
| git lfs install |
| git clone https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml |
| ``` |
|
|
| ## Usage |
|
|
| ``` |
| from faster_whisper import WhisperModel |
| |
| model_path = "vegam-whisper-medium-ml" |
| |
| # Run on GPU with FP16 |
| model = WhisperModel(model_path, device="cuda", compute_type="float16") |
| |
| # or run on GPU with INT8 |
| # model = WhisperModel(model_path, device="cuda", compute_type="int8_float16") |
| # or run on CPU with INT8 |
| # model = WhisperModel(model_path, device="cpu", compute_type="int8") |
| |
| segments, info = model.transcribe("audio.mp3", beam_size=5) |
| |
| print("Detected language '%s' with probability %f" % (info.language, info.language_probability)) |
| |
| for segment in segments: |
| print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) |
| ``` |
|
|
| ## Example |
|
|
| ``` |
| from faster_whisper import WhisperModel |
| |
| model_path = "vegam-whisper-medium-ml" |
| |
| model = WhisperModel(model_path, device="cuda", compute_type="float16") |
| |
| |
| segments, info = model.transcribe("00b38e80-80b8-4f70-babf-566e848879fc.webm", beam_size=5) |
| |
| print("Detected language '%s' with probability %f" % (info.language, info.language_probability)) |
| |
| for segment in segments: |
| print("[%.2fs -> %.2fs] %s" % (segment.start, segment.end, segment.text)) |
| ``` |
|
|
| > Detected language 'ta' with probability 0.353516 |
|
|
| > [0.00s -> 4.74s] പാലം കടുക്കുവോളം നാരായണ പാലം കടന്നാലൊ കൂരായണ |
|
|
| Note: The audio file [00b38e80-80b8-4f70-babf-566e848879fc.webm](https://huggingface.co/kurianbenoy/vegam-whisper-medium-ml/blob/main/00b38e80-80b8-4f70-babf-566e848879fc.webm) is from [Malayalam Speech Corpus](https://blog.smc.org.in/malayalam-speech-corpus/) and is stored along with model weights. |
| ## Conversion Details |
|
|
| This conversion was possible with wonderful [CTranslate2 library](https://github.com/OpenNMT/CTranslate2) leveraging the [Transformers converter for OpenAI Whisper](https://opennmt.net/CTranslate2/guides/transformers.html#whisper).The original model was converted with the following command: |
|
|
| ``` |
| ct2-transformers-converter --model thennal/whisper-medium-ml --output_dir vegam-whisper-medium-ml |
| ``` |
|
|
| ## Many Thanks to |
|
|
| - Creators of CTranslate2 and faster-whisper |
| - Thennal D K |
| - Santhosh Thottingal |
| |
|
|