ternary-models: VLMs, Multimodal & Audio
Collection
Ternary-quantized models for architectures GGUF can't handle. tritplane3 scheme. โข 16 items โข Updated โข 2
Ternary-quantized version of openai/whisper-medium.
| Property | Value |
|---|---|
| Base Model | openai/whisper-medium |
| Parameters | 769M |
| Quantization | tritplane3 (240 decoder layers) |
| Audio encoder | FP16 (preserved) |
| Stored size | 453 MB |
| FP16 size | ~3.1 GB |
| Compression | 1.30ร |
from ternary_quant.inference import load_ternary_model
import torch, numpy as np
model, proc = load_ternary_model("AsadIsmail/whisper-medium-ternary", runtime_mode="cached", device="cpu")
model = model.float() # Required for encoder compat
# Transcribe audio
import soundfile as sf
audio, sr = sf.read("audio.flac")
inputs = proc(audio.astype(np.float32), sampling_rate=sr, return_tensors="pt")
inputs = {k: v.float() for k, v in inputs.items()}
with torch.no_grad():
ids = model.generate(**inputs, max_new_tokens=100)
print(proc.batch_decode(ids, skip_special_tokens=True)[0])
Part of ternary-models.
Base model
openai/whisper-medium