Orbination Whisper AI

Quantization-aware compression of whisper-large-v3-turbo to a compact 368 MB, multilingual, CPU/GPU speech-to-text model (GGUF / whisper.cpp).

These are quantized GGUF checkpoints of a fine-tuned whisper-large-v3-turbo, produced with Q3_K-matched quantization-aware training (QAT) so that accuracy survives 3-bit quantization. A companion Go runtime (CPU/GPU hybrid, no PyTorch at runtime) is on GitHub.

➡️ Code, Go runtime & prebuilt binaries: https://github.com/amichail-1/Orbination-Whisper-AI

Files

File	Size	Role
`ggml-large-v3-turbo-q3_k.bin`	368 MB	smallest
`ggml-large-v3-turbo-q4_k.bin`	474 MB	balanced
`ggml-large-v3-turbo-q5_k.bin`	574 MB	best accuracy

Results — WER on held-out FLEURS (real speech), beam search

Model	Size	English	Spanish	French	Greek
Q3_K	368 MB	0.065	0.050	0.065	0.148
Q4_K	474 MB	0.062	0.048	0.063	0.124
Q5_K	574 MB	0.061	0.047	0.061	0.110
FP16 (upper bound)	1.6 GB	0.061	0.046	0.060	0.108

High-resource languages stay essentially flat across precisions; the custom kernel's largest gains appear on quantization-sensitive content (Greek: 0.285 → 0.148 at equal size).

Method (short)

whisper-large-v3-turbo has a shallow 4-layer decoder, so naive ≤3-bit quantization collapses it. We train with the exact ggml Q3_K quantizer in the forward pass (straight-through estimator on the backward) plus teacher distillation from the FP16 model. Because training == deployment, the exported standard Q3_K GGUF deploys at the trained error rate with no train/inference gap. Decoding uses beam search (size 5), which removes the repetition loops that inflate greedy WER.

The 368 MB floor is set by the token-embedding quantization (whisper.cpp compresses the 253 MB embedding to 3-bit); use Q4_K/Q5_K to give it more bits and lower WER further.

Usage (whisper.cpp)

# download a model
huggingface-cli download antoniosmich/Orbination-Whisper-AI ggml-large-v3-turbo-q3_k.bin --local-dir .

# run with whisper.cpp (16 kHz mono WAV)
./whisper-cli -m ggml-large-v3-turbo-q3_k.bin -bs 5 -l en audio.wav

Or use the Orbination Go runtime (CPU/GPU hybrid, CLI + HTTP server) from the GitHub repo.

License & attribution

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for antoniosmich/Orbination-Whisper-AI

Base model

openai/whisper-large-v3

Finetuned

openai/whisper-large-v3-turbo

Finetuned

(559)

this model