Instructions to use AIOnTheEdge/acft-whisper-small.da with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use AIOnTheEdge/acft-whisper-small.da with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("automatic-speech-recognition", model="AIOnTheEdge/acft-whisper-small.da")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("AIOnTheEdge/acft-whisper-small.da", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Danish Whisper (FUTO / ACFT) — selected components
This document covers four pieces of the pipeline: training (train_acft.py), Hugging Face → GGML conversion (convert-to-android.sh), the best small checkpoint (small_244M_danish_whisper_acft_futo_best), and Android-ready binaries (android-futo-ready).
Model Card
Fine-tuned OpenAI Whisper checkpoints for dynamic audio context robustness (ACFT): shorter encoder contexts work well on short or streaming audio. Method and upstream models: FUTO whisper-acft. This repo trains Danish variants on CoRal v3.
| Developed by | AIOnTheEdge |
| Finetuned from | OpenAI Whisper small |
| Model license | Apache-2.0 |
| Dataset license | OpenRAIL-D (commercial use with restrictions, e.g. speech synthesis and biometric identification) |
Uses. Checkpoints are meant for runtimes that expose variable audio context—e.g. whisper.cpp with --audio-context, or FUTO Keyboard on Android. Under default full-context Whisper settings they behave like ordinary Whisper. Pre-converted GGML models and usage notes: whisper-acft README.
Overview
train_acft.py → small_244M_danish_whisper_acft_futo_best/ → convert-to-android.sh → android-futo-ready/
(PyTorch) (HF weights + tokenizer) (convert) (ggml-model.bin)
| Step | Artifact | Role |
|---|---|---|
| 1 | train_acft.py |
Distill Whisper small toward FUTO-style partial-context behavior on Danish CoRal audio |
| 2 | small_244M_danish_whisper_acft_futo_best/ |
Best small (244M) Hugging Face checkpoint (~1.1 GB model.safetensors) |
| 3 | convert-to-android.sh |
Convert HF folders to ggml-model.bin for whisper.cpp / FUTO Keyboard |
| 4 | android-futo-ready/ |
Renamed .bin files ready to drop into the Android voice-input stack |
Prerequisites
- uv and Python ≥ 3.13 (see
pyproject.toml) - CUDA GPU recommended for training
- Git submodules / clones used by conversion:
whisper/— OpenAI reference repo (vocab layout forconvert-h5-to-ggml.py)whisper.cpp/— containsmodels/convert-h5-to-ggml.py
- Hugging Face token for the CoRal dataset (create
.env):
HF_TOKEN=hf_...
Install dependencies:
uv sync
train_acft.py
FUTO-style ACFT (audio-context fine-tuning) distillation for Danish ASR. The student model is trained to match the teacher’s decoder hidden states when the encoder only sees a variable-length partial mel context (simulating short streaming chunks), while the frozen teacher always uses the full 1500-frame encoder context.
What it does
- Loads
openai/whisper-smallas both student (model_train) and teacher (model_base). - Streams CoRal v3 read_aloud at 16 kHz; skips clips longer than 29 s.
- For each sample, picks a random partial encoder length
n_ctx(with jitter), runs MSE loss between student and teacher decoder hidden states, then optimizes with AdamW (batch size 1). - Logs to TensorBoard; saves checkpoints on improved EMA loss every 500 steps; early-stops after 20 evals without improvement or at 20 000 steps.
- On exit (normal, early stop, max steps, or Ctrl+C), always writes a latest checkpoint directory.
Usage
uv run python train_acft.py --size small
Output directories
Training writes Hugging Face–format folders (config, tokenizer, model.safetensors). Naming pattern from the script:
- Best (lowest EMA loss):
small_244M_danish_whisper_acft_futo_best - Latest (always saved at end):
small_244M_danish_whisper_acft_futo_latest
Hyperparameters (in script)
| Setting | Value |
|---|---|
| Learning rate | 1e-6 |
| Weight decay | 0.1 |
| Max steps | 20_000 |
| Eval / checkpoint interval | 500 |
| Early-stop patience | 20 evals |
| Processor | openai/whisper-small, language Danish, task transcribe |
Best run (reference)
From training logs for the bundled checkpoint:
[Step 24000] New best loss: 0.0313. Saved checkpoint to small_244M_danish_whisper_acft_futo_best
small_244M_danish_whisper_acft_futo_best
Hugging Face–format Whisper small checkpoint (244M parameters) after ACFT distillation. Use this folder as input to convert-to-android.sh or load directly with Transformers.
Contents
| File | Description |
|---|---|
model.safetensors |
Fine-tuned weights (~1.1 GB) |
config.json, generation_config.json |
Whisper-small architecture + generation defaults |
tokenizer.json, tokenizer_config.json, processor_config.json |
Danish transcribe tokenizer/processor |
vocab.json, added_tokens.json |
Vocabulary (also refreshed by convert-to-android.sh from upstream whisper-small) |
Load in Python (example)
from transformers import WhisperForConditionalGeneration, WhisperProcessor
model_dir = "small_244M_best_danish_whisper_acft_futo"
model = WhisperForConditionalGeneration.from_pretrained(model_dir)
processor = WhisperProcessor.from_pretrained(model_dir)
Architecture matches OpenAI whisper-small (d_model=768, 12 encoder/decoder layers).
convert-to-android.sh
Batch-converts trained HF model directories into GGML binaries for whisper.cpp / FUTO Keyboard.
Behavior
- Discovers folders (default: any directory name containing
danish_whisper_acft_futoin the project root). - Downloads
added_tokens.jsonandvocab.jsonfromopenai/whisper-smallinto each folder (ensures tokenizer compatibility with the converter). - Runs
uv run whisper.cpp/models/convert-h5-to-ggml.py <folder>/ ./whisper/ <folder>/ - Moves
ggml-model.bin→android-futo-ready/<folder>.bin
Usage
# Convert all danish_whisper_acft_futo* folders in the current directory
./convert-to-android.sh
# Convert one or more folders explicitly (space-separated)
./convert-to-android.sh small_244M_danish_whisper_acft_futo_best
Make the script executable if needed:
chmod +x convert-to-android.sh
Requirements
- Run from the repository root.
whisper/andwhisper.cpp/must be present.- Each source folder must contain a full HF checkpoint (including
model.safetensorsor equivalent weights the converter accepts).
android-futo-ready
Output directory for the GGML model consumed on-device (e.g. FUTO Keyboard Android voice input).
| File | Source | Approx. size |
|---|---|---|
small_244M_danish_whisper_acft_futo_best.bin |
small_244M_danish_whisper_acft_futo_best/ |
~488 MB |
Regenerate after retraining or updating the HF folder:
./convert-to-android.sh small_244M_danish_whisper_acft_futo_best
End-to-end
# 1. Train (or skip if using the bundled best checkpoint)
uv run python train_acft.py --size small
# 2. Convert best small checkpoint to GGML
./convert-to-android.sh small_244M_danish_whisper_acft_futo_best
# 3. Deploy
# Copy android-futo-ready/small_244M_danish_whisper_acft_futo_best.bin into your Android / FUTO build.
Troubleshooting
| Issue | Likely cause |
|---|---|
| Dataset load fails | Missing or invalid HF_TOKEN in .env |
| CUDA OOM during training | Reduce batch size or use a GPU with more VRAM; clips < 29 s are filtered but small still needs substantial memory |
convert-h5-to-ggml.py not found |
Clone or init whisper.cpp under the repo root |
| Converter vocab errors | Ensure whisper/ clone exists; script re-fetches vocab.json / added_tokens.json |
Empty android-futo-ready |
Run convert-to-android.sh after a successful HF checkpoint exists |
Model tree for AIOnTheEdge/acft-whisper-small.da
Base model
openai/whisper-small