Danish Whisper (FUTO / ACFT) — selected components

This document covers four pieces of the pipeline: training (train_acft.py), Hugging Face → GGML conversion (convert-to-android.sh), the best small checkpoint (small_244M_danish_whisper_acft_futo_best), and Android-ready binaries (android-futo-ready).

Model Card

Fine-tuned OpenAI Whisper checkpoints for dynamic audio context robustness (ACFT): shorter encoder contexts work well on short or streaming audio. Method and upstream models: FUTO whisper-acft. This repo trains Danish variants on CoRal v3.


Developed by	AIOnTheEdge
Finetuned from	OpenAI Whisper small
Model license	Apache-2.0
Dataset license	OpenRAIL-D (commercial use with restrictions, e.g. speech synthesis and biometric identification)

Uses. Checkpoints are meant for runtimes that expose variable audio context—e.g. whisper.cpp with --audio-context, or FUTO Keyboard on Android. Under default full-context Whisper settings they behave like ordinary Whisper. Pre-converted GGML models and usage notes: whisper-acft README.

Overview

train_acft.py  →  small_244M_danish_whisper_acft_futo_best/  →  convert-to-android.sh  →  android-futo-ready/
   (PyTorch)              (HF weights + tokenizer)              (convert)           (ggml-model.bin)

Step	Artifact	Role
1	`train_acft.py`	Distill Whisper small toward FUTO-style partial-context behavior on Danish CoRal audio
2	`small_244M_danish_whisper_acft_futo_best/`	Best small (244M) Hugging Face checkpoint (~1.1 GB `model.safetensors`)
3	`convert-to-android.sh`	Convert HF folders to `ggml-model.bin` for whisper.cpp / FUTO Keyboard
4	`android-futo-ready/`	Renamed `.bin` files ready to drop into the Android voice-input stack

Prerequisites

uv and Python ≥ 3.13 (see pyproject.toml)
CUDA GPU recommended for training
Git submodules / clones used by conversion:
- whisper/ — OpenAI reference repo (vocab layout for convert-h5-to-ggml.py)
- whisper.cpp/ — contains models/convert-h5-to-ggml.py
Hugging Face token for the CoRal dataset (create .env):

HF_TOKEN=hf_...

Install dependencies:

uv sync

`train_acft.py`

FUTO-style ACFT (audio-context fine-tuning) distillation for Danish ASR. The student model is trained to match the teacher’s decoder hidden states when the encoder only sees a variable-length partial mel context (simulating short streaming chunks), while the frozen teacher always uses the full 1500-frame encoder context.

What it does

Loads openai/whisper-small as both student (model_train) and teacher (model_base).
Streams CoRal v3 read_aloud at 16 kHz; skips clips longer than 29 s.
For each sample, picks a random partial encoder length n_ctx (with jitter), runs MSE loss between student and teacher decoder hidden states, then optimizes with AdamW (batch size 1).
Logs to TensorBoard; saves checkpoints on improved EMA loss every 500 steps; early-stops after 20 evals without improvement or at 20 000 steps.
On exit (normal, early stop, max steps, or Ctrl+C), always writes a latest checkpoint directory.

Usage

uv run python train_acft.py --size small

Output directories

Training writes Hugging Face–format folders (config, tokenizer, model.safetensors). Naming pattern from the script:

Best (lowest EMA loss): small_244M_danish_whisper_acft_futo_best
Latest (always saved at end): small_244M_danish_whisper_acft_futo_latest

Hyperparameters (in script)

Setting	Value
Learning rate	`1e-6`
Weight decay	`0.1`
Max steps	`20_000`
Eval / checkpoint interval	`500`
Early-stop patience	`20` evals
Processor	`openai/whisper-small`, language Danish, task transcribe

Best run (reference)

From training logs for the bundled checkpoint:

[Step 24000] New best loss: 0.0313. Saved checkpoint to small_244M_danish_whisper_acft_futo_best

`small_244M_danish_whisper_acft_futo_best`

Hugging Face–format Whisper small checkpoint (244M parameters) after ACFT distillation. Use this folder as input to convert-to-android.sh or load directly with Transformers.

File	Description
`model.safetensors`	Fine-tuned weights (~1.1 GB)
`config.json`, `generation_config.json`	Whisper-small architecture + generation defaults
`tokenizer.json`, `tokenizer_config.json`, `processor_config.json`	Danish transcribe tokenizer/processor
`vocab.json`, `added_tokens.json`	Vocabulary (also refreshed by `convert-to-android.sh` from upstream whisper-small)

Load in Python (example)

from transformers import WhisperForConditionalGeneration, WhisperProcessor

model_dir = "small_244M_best_danish_whisper_acft_futo"
model = WhisperForConditionalGeneration.from_pretrained(model_dir)
processor = WhisperProcessor.from_pretrained(model_dir)

Architecture matches OpenAI whisper-small (d_model=768, 12 encoder/decoder layers).

`convert-to-android.sh`

Batch-converts trained HF model directories into GGML binaries for whisper.cpp / FUTO Keyboard.

Behavior

Discovers folders (default: any directory name containing danish_whisper_acft_futo in the project root).
Downloads added_tokens.json and vocab.json from openai/whisper-small into each folder (ensures tokenizer compatibility with the converter).
Runs uv run whisper.cpp/models/convert-h5-to-ggml.py <folder>/ ./whisper/ <folder>/
Moves ggml-model.bin → android-futo-ready/<folder>.bin

Usage

# Convert all danish_whisper_acft_futo* folders in the current directory
./convert-to-android.sh

# Convert one or more folders explicitly (space-separated)
./convert-to-android.sh small_244M_danish_whisper_acft_futo_best

Make the script executable if needed:

chmod +x convert-to-android.sh

Requirements

Run from the repository root.
whisper/ and whisper.cpp/ must be present.
Each source folder must contain a full HF checkpoint (including model.safetensors or equivalent weights the converter accepts).

`android-futo-ready`

Output directory for the GGML model consumed on-device (e.g. FUTO Keyboard Android voice input).

File	Source	Approx. size
`small_244M_danish_whisper_acft_futo_best.bin`	`small_244M_danish_whisper_acft_futo_best/`	~488 MB

Regenerate after retraining or updating the HF folder:

./convert-to-android.sh small_244M_danish_whisper_acft_futo_best

End-to-end

# 1. Train (or skip if using the bundled best checkpoint)
uv run python train_acft.py --size small

# 2. Convert best small checkpoint to GGML
./convert-to-android.sh small_244M_danish_whisper_acft_futo_best

# 3. Deploy
# Copy android-futo-ready/small_244M_danish_whisper_acft_futo_best.bin into your Android / FUTO build.

Troubleshooting

Issue	Likely cause
Dataset load fails	Missing or invalid `HF_TOKEN` in `.env`
CUDA OOM during training	Reduce batch size or use a GPU with more VRAM; clips < 29 s are filtered but small still needs substantial memory
`convert-h5-to-ggml.py` not found	Clone or init `whisper.cpp` under the repo root
Converter vocab errors	Ensure `whisper/` clone exists; script re-fetches `vocab.json` / `added_tokens.json`
Empty `android-futo-ready`	Run `convert-to-android.sh` after a successful HF checkpoint exists

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for AIOnTheEdge/acft-whisper-small.da

Base model

openai/whisper-small

Finetuned

(3518)

this model

AIOnTheEdge
/

acft-whisper-small.da

Danish Whisper (FUTO / ACFT) — selected components

Model Card

Overview

Prerequisites

`train_acft.py`

What it does

Usage

Output directories

Hyperparameters (in script)

Best run (reference)

`small_244M_danish_whisper_acft_futo_best`

Contents

Load in Python (example)

`convert-to-android.sh`

Behavior

Usage

Requirements

`android-futo-ready`

End-to-end

Troubleshooting

Model tree for AIOnTheEdge/acft-whisper-small.da

Dataset used to train AIOnTheEdge/acft-whisper-small.da