Whisper-large-v3 — Ternary Quantized
Ternary-quantized version of openai/whisper-large-v3, produced with ternary-quant.
This demonstrates ternary-quant's unique capability to quantize audio/speech models — something GGUF and GPTQ were not designed for. The decoder is ternary-quantized while the audio encoder is preserved in FP16 for maximum transcription quality.
Quantization details
| Metric | Value |
|---|---|
| Scheme | tritplane3 (3-plane progressive ternary) |
| Components quantized | decoder (320 linear layers) |
| Audio encoder | Kept in FP16 (preserving audio understanding quality) |
| Stored size | 943.7 MB |
| FP16 size | 1677.7 MB |
| Compression ratio | 1.8x |
Usage
from ternary_quant.inference import load_ternary_model
import torch
model, processor = load_ternary_model(
"AsadIsmail/whisper-large-v3-ternary",
runtime_mode="cached",
device="cpu"
)
# Important: cast to float32 to match encoder conv1d dtype
model = model.float()
# Transcribe audio
import librosa
audio, sr = librosa.load("audio.mp3", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
inputs = {k: v.to("cpu").float() for k, v in inputs.items()}
with torch.no_grad():
predicted_ids = model.generate(**inputs)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])
Reproduce
pip install ternary-quant
ternary-quant quantize-broad openai/whisper-large-v3 \
--output ./whisper-large-v3-ternary \
--components decoder \
--scheme tritplane3 --dtype float16 --eval
Part of the ternary-models collection
Model tree for AsadIsmail/whisper-large-v3-ternary
Base model
openai/whisper-large-v3