You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

Whisper Medium Malayalam - GGML Format

This is a GGML-converted version of thennal/whisper-medium-ml optimized for use with whisper.cpp.

Key Features:

  • ๐Ÿš€ Multiple quantized versions (Q4, Q5, Q8) for different use cases
  • ๐Ÿ“ฑ Optimized for on-device, offline inference
  • โšก Up to 85% size reduction with quantization
  • ๐ŸŽฏ Malayalam language specialization
  • ๐Ÿ’ป Cross-platform support (CPU, Metal, CUDA, etc.)

Model Details

  • Base Model: OpenAI Whisper Medium
  • Language: Malayalam
  • Task: Automatic Speech Recognition (ASR)
  • Format: GGML (converted from PyTorch)
  • Source: Fine-tuned on Common Voice 11.0 dataset

Available Model Variants

This repository provides multiple quantized versions optimized for different use cases:

Model Size Use Case Quality
ggml-model.bin 1.4 GB Original conversion (F16) Highest quality
ggml-model-q8_0.bin 785 MB High quality, smaller size Very high quality
ggml-model-q5_0.bin 514 MB Recommended - Balanced quality/size Good quality
ggml-model-q4_0.bin 424 MB Smallest size, faster inference Acceptable quality

Recommendation: For most users, ggml-model-q5_0.bin offers the best balance between quality and file size.

Performance (from source model)

  • Word Error Rate (WER): 38.62% (without normalization)
  • Character Error Rate (CER): 7.33%
  • WER with normalization: 11.49%

Note: Whisper's normalization has significant issues for Malayalam language.

Usage with whisper.cpp

Prerequisites

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
make

Download the model

Download one of the model files from this repository and place it in the models directory of whisper.cpp:

  • Recommended: ggml-model-q5_0.bin (514 MB)
  • Smallest: ggml-model-q4_0.bin (424 MB)
  • Highest quality: ggml-model-q8_0.bin (785 MB)
  • Original: ggml-model.bin (1.4 GB)

Run inference

# Using the recommended Q5_0 model
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml

# Or using any other variant
./build/bin/whisper-cli -m models/ggml-model-q4_0.bin -f audio.wav -l ml

Where:

  • -m specifies the model file
  • -f specifies the input audio file (must be 16-bit WAV)
  • -l ml sets the language to Malayalam

Additional options

# Translate to English
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -tr

# Output in different formats
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -osrt  # SubRip subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -ovtt  # WebVTT subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -otxt  # Plain text

Conversion Details

This model was converted from the HuggingFace transformers format to GGML using the convert-h5-to-ggml.py script from whisper.cpp.

Quantization Details

The quantized models were created using whisper.cpp's quantization tool:

  • Q8_0: 8-bit quantization, retains ~99% of original quality
  • Q5_0: 5-bit quantization, excellent quality/size balance (~73% size reduction)
  • Q4_0: 4-bit quantization, maximum compression (~85% size reduction)

All quantized models maintain the full model architecture and can be used as drop-in replacements.

Training Data

The source model was fine-tuned on multiple Malayalam speech datasets:

Citation

If you use this model, please cite:

@misc{whisper-medium-ml-ggml,
  author = {Thennal D K},
  title = {Whisper Medium Malayalam - GGML Format},
  year = {2024},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/thennal/whisper-medium-ml}},
  note = {GGML conversion with quantization}
}

@misc{radford2022whisper,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
  year={2022},
  eprint={2212.04356},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

License

Apache 2.0 - Same as the original Whisper model and fine-tuned version.

Acknowledgments

This model builds upon the work of many contributors:

Original Model & Framework

  • OpenAI Whisper Team - For the groundbreaking Whisper ASR model (paper, code)
  • Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever - Whisper authors

Malayalam Fine-tuning

Datasets

  • Mozilla Foundation - Common Voice 11.0 Malayalam dataset
  • Google - FLEURS multilingual dataset
  • Community contributors - IMaSC, ULCA, MSC, and Indic TTS Malayalam datasets

GGML Implementation

  • whisper.cpp team - For the efficient C/C++ implementation and GGML format
  • ggml-org - For the GGML machine learning library

Tools & Frameworks

  • HuggingFace Transformers - Model training and inference framework
  • PyTorch - Deep learning framework

Special Thanks: This conversion makes the Malayalam Whisper model accessible for on-device, offline inference on various platforms including mobile devices, embedded systems, and resource-constrained environments.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for sujithatz/ggml-whisper-medium-ml

Finetuned
(2)
this model

Paper for sujithatz/ggml-whisper-medium-ml

Evaluation results