Whisper Large v3 Turbo Β· OpenASR
Fast multilingual Whisper built from pruned large-v3
Native speech-to-text in the OpenASR runtime β engineered for peak performance on CPU & GPU, no Python at inference time.
β¨ Highlights
- β‘ Turbo decoder β prunes Whisper large-v3's decoder from 32 layers to 4 for much faster generation
- π Multilingual ASR β transcribes many languages and can translate speech to English
- ποΈ Zero-shot robustness β inherits Whisper's large-scale weak-supervision training across noisy domains
- π¦ Native in OpenASR β
.oasrpacks run with no Python at inference, engineered for peak performance on CPU & GPU
π Quickstart
# 1. Install the OpenASR CLI Β· https://openasr.org
# 2. Pull a build (pick a quant β see the table below)
openasr pull whisper-large-v3-turbo:q8
# 3. Transcribe
openasr transcribe audio.wav --model whisper-large-v3-turbo
All builds for this model:
openasr pull whisper-large-v3-turbo:fp16
openasr pull whisper-large-v3-turbo:q8
openasr pull whisper-large-v3-turbo:q4
π¦ Available builds
| Quant | File (.oasr) |
Size | RAM peak | RTF Β· M1 CPU | RTF Β· M1 GPU | JFK ΞWER vs fp16 |
|---|---|---|---|---|---|---|
| fp16 | whisper-large-v3-turbo-fp16.oasr |
1.62 GB | 3.62 GB | 0.52Γ | 0.39Γ | 0.0% |
| q8_0 | whisper-large-v3-turbo-q8_0.oasr |
931 MB | 2.28 GB | 0.52Γ | 0.35Γ | 0.0% |
| q4_k | whisper-large-v3-turbo-q4_k.oasr |
564 MB | 1.54 GB | 0.51Γ | 0.25Γ | 0.0% |
RTF = real-time factor on the fixed 11s JFK clip (lower is faster); RAM peak measured per pack in an isolated subprocess. JFK ΞWER compares each quantized build's JFK transcript to this model's fp16 JFK transcript, so it measures quantization drift rather than absolute recognition accuracy. q8_0 is the recommended default β near-reference quality at a fraction of the footprint.
π§ About Whisper Large v3 Turbo
Whisper Large v3 Turbo is OpenAI's faster variant of Whisper large-v3: it keeps the same
Whisper architecture and multilingual speech-recognition/translation interface, but reduces
the decoder depth from 32 layers to 4. The upstream card describes the result as much faster
with only a minor quality trade-off, while retaining Whisper's broad zero-shot behavior from
training on more than five million hours of labeled audio. This OpenASR repo repackages the
original openai/whisper-large-v3-turbo weights as .oasr packs that run natively in the
OpenASR runtime with no Python at inference time. For most users the q8_0 build is the
recommended default; q4_k is for tighter memory budgets and fp16 is for verification or
maximum fidelity.
βοΈ How these packs were made
Converted from openai/whisper-large-v3-turbo with the OpenASR importer:
openasr model-pack import-whisper-local <src> <out>.oasr \
--package-id whisper-large-v3-turbo --quantization {fp16,q8-0,q4-k}
The .oasr container is GGUF-backed; packs use zero-copy mmap weight binding and graph
buffer reuse to keep peak memory low.
βοΈ License
These packs inherit the upstream model's license: MIT (source). OpenASR packaging retains the upstream copyright and NOTICE; the only modifications are format conversion and quantization.
π Acknowledgements
This pack is a redistribution of Whisper Large v3 Turbo, released by OpenAI
(openai/whisper-large-v3-turbo).
All credit for the original model, training recipe, and weights belongs to OpenAI. The packs
inherit the upstream MIT license; OpenASR only converts the weights into .oasr packages and
adds quantized builds for local runtime use.
π Links
- π¦ OpenASR β https://github.com/QuintinShaw/openasr
- π Website β https://openasr.org
- π€ Upstream model β openai/whisper-large-v3-turbo
Model tree for OpenASR/whisper-large-v3-turbo
Base model
openai/whisper-large-v3