Whisper-large-v3 — Ternary Quantized

Ternary-quantized version of openai/whisper-large-v3, produced with ternary-quant.

This demonstrates ternary-quant's component-aware workflow for audio/speech models. The decoder is ternary-quantized while the audio encoder is preserved in FP16 for transcription quality. This is a HuggingFace-native PTQ artifact rather than a GGUF deployment artifact.

Quantization details

Metric	Value
Scheme	tritplane3 (3-plane progressive ternary)
Components quantized	decoder (320 linear layers)
Audio encoder	Kept in FP16 (preserving audio understanding quality)
Stored size	943.7 MB
FP16 size	1677.7 MB
Compression ratio	1.8x

Usage

from ternary_quant.inference import load_ternary_model
import torch

model, processor = load_ternary_model(
    "AsadIsmail/whisper-large-v3-ternary",
    runtime_mode="cached",
    device="cpu"
)
# Important: cast to float32 to match encoder conv1d dtype
model = model.float()

# Transcribe audio
import librosa
audio, sr = librosa.load("audio.mp3", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
inputs = {k: v.to("cpu").float() for k, v in inputs.items()}

with torch.no_grad():
    predicted_ids = model.generate(**inputs)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

Reproduce

pip install ternary-quant
ternary-quant quantize-broad openai/whisper-large-v3 \
    --output ./whisper-large-v3-ternary \
    --components decoder \
    --scheme tritplane3 --dtype float16 --eval

Part of the ternary-models collection

github.com/Asad-Ismail/ternary-models

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for AsadIsmail/whisper-large-v3-ternary

Base model

openai/whisper-large-v3

Finetuned

(875)

this model

Collection including AsadIsmail/whisper-large-v3-ternary

ternary-models: VLMs, Multimodal & Audio

Collection

Ternary-quantized models for architectures GGUF can't handle. tritplane3 scheme. • 16 items • Updated Apr 17 • 5