You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Whisper Medium Malayalam - GGML Format

This is a GGML-converted version of thennal/whisper-medium-ml optimized for use with whisper.cpp.

Key Features:

🚀 Multiple quantized versions (Q4, Q5, Q8) for different use cases
📱 Optimized for on-device, offline inference
⚡ Up to 85% size reduction with quantization
🎯 Malayalam language specialization
💻 Cross-platform support (CPU, Metal, CUDA, etc.)

Model Details

Base Model: OpenAI Whisper Medium
Language: Malayalam
Task: Automatic Speech Recognition (ASR)
Format: GGML (converted from PyTorch)
Source: Fine-tuned on Common Voice 11.0 dataset

Available Model Variants

This repository provides multiple quantized versions optimized for different use cases:

Model	Size	Use Case	Quality
`ggml-model.bin`	1.4 GB	Original conversion (F16)	Highest quality
`ggml-model-q8_0.bin`	785 MB	High quality, smaller size	Very high quality
`ggml-model-q5_0.bin`	514 MB	Recommended - Balanced quality/size	Good quality
`ggml-model-q4_0.bin`	424 MB	Smallest size, faster inference	Acceptable quality

Recommendation: For most users, ggml-model-q5_0.bin offers the best balance between quality and file size.

Performance (from source model)

Word Error Rate (WER): 38.62% (without normalization)
Character Error Rate (CER): 7.33%
WER with normalization: 11.49%

Note: Whisper's normalization has significant issues for Malayalam language.

Usage with whisper.cpp

Prerequisites

git clone https://github.com/ggml-org/whisper.cpp
cd whisper.cpp
make

Download the model

Download one of the model files from this repository and place it in the models directory of whisper.cpp:

Recommended: ggml-model-q5_0.bin (514 MB)
Smallest: ggml-model-q4_0.bin (424 MB)
Highest quality: ggml-model-q8_0.bin (785 MB)
Original: ggml-model.bin (1.4 GB)

Run inference

# Using the recommended Q5_0 model
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml

# Or using any other variant
./build/bin/whisper-cli -m models/ggml-model-q4_0.bin -f audio.wav -l ml

Where:

-m specifies the model file
-f specifies the input audio file (must be 16-bit WAV)
-l ml sets the language to Malayalam

Additional options

# Translate to English
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -tr

# Output in different formats
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -osrt  # SubRip subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -ovtt  # WebVTT subtitles
./build/bin/whisper-cli -m models/ggml-model-q5_0.bin -f audio.wav -l ml -otxt  # Plain text

Conversion Details

This model was converted from the HuggingFace transformers format to GGML using the convert-h5-to-ggml.py script from whisper.cpp.

Quantization Details

The quantized models were created using whisper.cpp's quantization tool:

Q8_0: 8-bit quantization, retains ~99% of original quality
Q5_0: 5-bit quantization, excellent quality/size balance (~73% size reduction)
Q4_0: 4-bit quantization, maximum compression (~85% size reduction)

All quantized models maintain the full model architecture and can be used as drop-in replacements.

Training Data

The source model was fine-tuned on multiple Malayalam speech datasets:

Citation

If you use this model, please cite:

@misc{whisper-medium-ml-ggml,
  author = {Thennal D K},
  title = {Whisper Medium Malayalam - GGML Format},
  year = {2024},
  publisher = {HuggingFace},
  journal = {HuggingFace Model Hub},
  howpublished = {\url{https://huggingface.co/thennal/whisper-medium-ml}},
  note = {GGML conversion with quantization}
}

@misc{radford2022whisper,
  title={Robust Speech Recognition via Large-Scale Weak Supervision},
  author={Alec Radford and Jong Wook Kim and Tao Xu and Greg Brockman and Christine McLeavey and Ilya Sutskever},
  year={2022},
  eprint={2212.04356},
  archivePrefix={arXiv},
  primaryClass={eess.AS}
}

License

Apache 2.0 - Same as the original Whisper model and fine-tuned version.

Acknowledgments

This model builds upon the work of many contributors:

Original Model & Framework

OpenAI Whisper Team - For the groundbreaking Whisper ASR model (paper, code)
Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, Ilya Sutskever - Whisper authors

Malayalam Fine-tuning

Thennal D K - For fine-tuning Whisper Medium on Malayalam datasets and making it available on HuggingFace
Original model: thennal/whisper-medium-ml
Training resources: Fine-tuning Colab

Datasets

Mozilla Foundation - Common Voice 11.0 Malayalam dataset
Google - FLEURS multilingual dataset
Community contributors - IMaSC, ULCA, MSC, and Indic TTS Malayalam datasets

GGML Implementation

whisper.cpp team - For the efficient C/C++ implementation and GGML format
ggml-org - For the GGML machine learning library

Tools & Frameworks

HuggingFace Transformers - Model training and inference framework
PyTorch - Deep learning framework

Special Thanks: This conversion makes the Malayalam Whisper model accessible for on-device, offline inference on various platforms including mobile devices, embedded systems, and resource-constrained environments.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sujithatz/ggml-whisper-medium-ml

Base model

openai/whisper-medium

Finetuned

thennal/whisper-medium-ml

Finetuned

(2)

this model

Paper for sujithatz/ggml-whisper-medium-ml

Robust Speech Recognition via Large-Scale Weak Supervision

Paper • 2212.04356 • Published Dec 6, 2022 • 54

Evaluation results

WER (with normalization) on Common Voice 11.0
test set self-reported

11.490
WER (without normalization) on Common Voice 11.0
test set self-reported

38.620
CER on Common Voice 11.0
test set self-reported

7.330