Whisper-large-v3 — Ternary Quantized

Ternary-quantized version of openai/whisper-large-v3, produced with ternary-quant.

This demonstrates ternary-quant's unique capability to quantize audio/speech models — something GGUF and GPTQ were not designed for. The decoder is ternary-quantized while the audio encoder is preserved in FP16 for maximum transcription quality.

Quantization details

Metric Value
Scheme tritplane3 (3-plane progressive ternary)
Components quantized decoder (320 linear layers)
Audio encoder Kept in FP16 (preserving audio understanding quality)
Stored size 943.7 MB
FP16 size 1677.7 MB
Compression ratio 1.8x

Usage

from ternary_quant.inference import load_ternary_model
import torch

model, processor = load_ternary_model(
    "AsadIsmail/whisper-large-v3-ternary",
    runtime_mode="cached",
    device="cpu"
)
# Important: cast to float32 to match encoder conv1d dtype
model = model.float()

# Transcribe audio
import librosa
audio, sr = librosa.load("audio.mp3", sr=16000)
inputs = processor(audio, sampling_rate=16000, return_tensors="pt")
inputs = {k: v.to("cpu").float() for k, v in inputs.items()}

with torch.no_grad():
    predicted_ids = model.generate(**inputs)
transcription = processor.batch_decode(predicted_ids, skip_special_tokens=True)
print(transcription[0])

Reproduce

pip install ternary-quant
ternary-quant quantize-broad openai/whisper-large-v3 \
    --output ./whisper-large-v3-ternary \
    --components decoder \
    --scheme tritplane3 --dtype float16 --eval

Part of the ternary-models collection

github.com/Asad-Ismail/ternary-models

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AsadIsmail/whisper-large-v3-ternary

Finetuned
(797)
this model