Malayalam Whisper to FUTO Keyboard: Full Pipeline

This repository contains an end-to-end Jupyter Notebook pipeline designed to prepare Hugging Face Whisper models for use with the FUTO Keyboard on Android.

The pipeline specifically focuses on adapting Malayalam Whisper models (such as whisper-small-malayalam) by applying Audio Context Fine-Tuning (ACFT), converting the standard Hugging Face weights into the GGML format, and quantizing the model for efficient mobile inference.

Features

Audio Context Fine-Tuning (ACFT): Trains the target model to handle dynamic audio contexts (under 30 seconds) without endless looping or repetition, using a frozen reference model.
GGML Conversion: Automatically clones the required OpenAI and whisper.cpp repositories to convert the fine-tuned .safetensors model into a standard .bin file.
Mobile Quantization: Compiles the whisper-quantize tool and generates optimized, quantized versions (e.g., q5_0) of the model tailored for smartphone hardware constraints.
Fast Dependency Management: Utilizes uv for rapid package installation and environment setup within the Colab runtime.

Requirements

This notebook is designed to be executed in Google Colab to leverage cloud GPU acceleration and avoid local hardware memory constraints.

Environment: Google Colab
Hardware: T4 GPU (Required for the ACFT training loop to complete in a reasonable timeframe)
Storage: Access to Google Drive (if loading local datasets like Mozilla Common Voice .tar.gz archives or saving the final output directly to Drive).

Usage Instructions

1. Environment Setup

Upload the malayalam_whisper_full_pipeline.ipynb notebook to your Google Colab environment.
Navigate to Runtime > Change runtime type and select T4 GPU.
Run the first execution cell to install the required dependencies (torch, transformers, datasets, librosa, etc.) via uv.

2. Execution

Execute the notebook cells sequentially. The pipeline handles:

Downloading the target Whisper model and the Common Voice dataset.
Running the 500-step ACFT training loop to minimize MSE loss between the target and reference models.
Merging the updated weights and saving the PyTorch structure.
Running convert-h5-to-ggml.py to generate the base ggml-model.bin file.
Executing the whisper.cpp Makefile and generating quantized .bin files.

3. Deployment

Once the notebook finishes executing all quantization steps, the final models will be available in the designated /content/output/ directory (or your mounted Google Drive).

Download the highly recommended malayalam-futo-q5_0.bin file.
Transfer the .bin file to your Android device's internal storage.
Open the FUTO Keyboard settings, navigate to the Voice Input section, and import the downloaded model.

Tools & References

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Athulkrishna/thennal-whisper-medium-ml-acft

Base model

openai/whisper-medium

Finetuned

thennal/whisper-medium-ml

Finetuned

(2)

this model