Malayalam Whisper to FUTO Keyboard: Full Pipeline

This repository contains an end-to-end Jupyter Notebook pipeline designed to prepare Hugging Face Whisper models for use with the FUTO Keyboard on Android.

The pipeline specifically focuses on adapting Malayalam Whisper models (such as whisper-small-malayalam) by applying Audio Context Fine-Tuning (ACFT), converting the standard Hugging Face weights into the GGML format, and quantizing the model for efficient mobile inference.

Features

  • Audio Context Fine-Tuning (ACFT): Trains the target model to handle dynamic audio contexts (under 30 seconds) without endless looping or repetition, using a frozen reference model.
  • GGML Conversion: Automatically clones the required OpenAI and whisper.cpp repositories to convert the fine-tuned .safetensors model into a standard .bin file.
  • Mobile Quantization: Compiles the whisper-quantize tool and generates optimized, quantized versions (e.g., q5_0) of the model tailored for smartphone hardware constraints.
  • Fast Dependency Management: Utilizes uv for rapid package installation and environment setup within the Colab runtime.

Requirements

This notebook is designed to be executed in Google Colab to leverage cloud GPU acceleration and avoid local hardware memory constraints.

  • Environment: Google Colab
  • Hardware: T4 GPU (Required for the ACFT training loop to complete in a reasonable timeframe)
  • Storage: Access to Google Drive (if loading local datasets like Mozilla Common Voice .tar.gz archives or saving the final output directly to Drive).

Usage Instructions

1. Environment Setup

  1. Upload the malayalam_whisper_full_pipeline.ipynb notebook to your Google Colab environment.
  2. Navigate to Runtime > Change runtime type and select T4 GPU.
  3. Run the first execution cell to install the required dependencies (torch, transformers, datasets, librosa, etc.) via uv.

2. Execution

Execute the notebook cells sequentially. The pipeline handles:

  • Downloading the target Whisper model and the Common Voice dataset.
  • Running the 1500-step ACFT training loop to minimize MSE loss between the target and reference models.
  • Merging the updated weights and saving the PyTorch structure.
  • Running convert-h5-to-ggml.py to generate the base ggml-model.bin file.
  • Executing the whisper.cpp Makefile and generating quantized .bin files.

Training ACFT (1500 steps)... Step 0 | Loss: 1.133623 Step 50 | Loss: 0.156094 Step 100 | Loss: 0.239148 Step 150 | Loss: 0.100868 Step 200 | Loss: 0.100326 Step 250 | Loss: 0.082330 Step 300 | Loss: 0.065249 Step 350 | Loss: 0.133438 Step 400 | Loss: 0.105161 Step 450 | Loss: 0.083460 Step 500 | Loss: 0.185798 Step 550 | Loss: 0.116877 Step 600 | Loss: 0.069572 Step 650 | Loss: 0.139821 Step 700 | Loss: 0.291859 Step 750 | Loss: 0.053645 Step 800 | Loss: 0.068235 Step 850 | Loss: 0.041750 Step 900 | Loss: 0.049185 Step 950 | Loss: 0.106350 Step 1000 | Loss: 0.154282 Step 1050 | Loss: 0.124018 Step 1100 | Loss: 0.120467 Step 1150 | Loss: 0.046497 Step 1200 | Loss: 0.032196

3. Deployment

Once the notebook finishes executing all quantization steps, the final models will be available in the designated /content/output/ directory (or your mounted Google Drive).

  1. Download the highly recommended malayalam-futo-q5_0.bin file.
  2. Transfer the .bin file to your Android device's internal storage.
  3. Open the FUTO Keyboard settings, navigate to the Voice Input section, and import the downloaded model.

Tools & References


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Athulkrishna/vrclc-Whisper-medium-Malayalam-afct-v2

Finetuned
(4)
this model