CLIP-ViT-L/14-HXQ
3.6x smaller from FP32. CIFAR-100 Top-1 72.8%. First vision model compressed with HXQ.
CLIP ViT-Large/14 (text + vision dual encoder) compressed from 1.6 GB to 447 MB. Zero-shot classification accuracy matches the dense baseline. No calibration data. Same codec that compresses Transformers, SSMs, Hybrids, and MoEs.
Install and Run
pip install "helix-substrate[hf]"
import helix_substrate # registers the HXQ quantizer with HuggingFace
from transformers import CLIPModel, CLIPProcessor
from PIL import Image
model = CLIPModel.from_pretrained("EchoLabs33/clip-vit-large-patch14-helix")
processor = CLIPProcessor.from_pretrained("EchoLabs33/clip-vit-large-patch14-helix")
image = Image.open("photo.jpg")
inputs = processor(
text=["a photo of a cat", "a photo of a dog", "a photo of a car"],
images=image,
return_tensors="pt",
padding=True,
)
outputs = model(**inputs)
probs = outputs.logits_per_image.softmax(dim=-1)
print(probs) # [cat_prob, dog_prob, car_prob]
Downstream Benchmarks
Zero-shot CIFAR-100 classification (10,000 test images, 100 classes, prompt: "a photo of a {class}"):
| Metric | Dense | HXQ (3.6x) | Delta |
|---|---|---|---|
| Top-1 Accuracy | 72.48% | 72.75% | +0.27% |
| Top-5 Accuracy | 91.41% | 91.64% | +0.23% |
All deltas within noise. Task performance preserved after 3.6x compression.
Compression Benchmark
| Dense (FP32) | HXQ | |
|---|---|---|
| Size | 1.6 GB | 447 MB |
| Compression ratio | -- | 3.6x |
| VRAM (eval) | 3,412 MB | 2,266 MB |
| Compressed modules | -- | 218 HelixLinear layers |
| Architecture | CLIP (ViT-L/14 + Text Transformer) | unchanged |
Verification Status
- Compression receipt: PASS -- 218 compressed, 374 exact, mean cosine 0.9997
- Conversion receipt: PASS (Gate 1 + Gate 2)
- Downstream eval: PASS -- paired dense/HXQ on CIFAR-100 zero-shot
Good to Know
- GPU and CPU supported -- runs on any CUDA GPU or CPU.
- Not fine-tunable -- compressed weights are read-only (
is_trainable = False). - Requires
helix-substrate-- you needpip install "helix-substrate[hf]". - Embeddings stored exact -- token, position, and patch embeddings are at full precision. Only the 218 attention + MLP linear layers are compressed.
What is HelixCode?
HelixCode is a universal weight compression codec based on vector quantization:
- Each weight matrix is replaced by a 256-entry codebook (float32) + uint8 index matrix + optional sidecar corrections for outlier values
- The compressed form is the executable -- no decompression step
- Works on any
nn.Linearregardless of architecture - No calibration data required -- codebooks are fit from the weights alone
Architecture Details
CLIP ViT-Large/14 is a dual-encoder multimodal model:
- Vision encoder: 24-layer ViT-Large, hidden_size=1024, 16 attention heads, patch_size=14
- Text encoder: 12-layer Transformer, hidden_size=768, 12 attention heads
- Cross-modal projections: visual_projection (1024->768) + text_projection (768->768)
All 218 linear layers across both encoders are compressed. Embedding layers (token, position, patch), layer norms, and biases are stored at full precision.
Why This Matters
CLIP is the first vision model compressed with HXQ. The same codec now covers:
| Family | Models | Eval |
|---|---|---|
| Transformer | TinyLlama, Qwen 1.5B-14B | PPL within noise |
| Pure SSM | Mamba 130m, Mamba2 1.3B | PPL receipted |
| Hybrid | Zamba2 1.2B, 2.7B | PPL receipted |
| MoE | OLMoE 1B/7B | HellaSwag -0.16% |
| Vision+Text | CLIP ViT-L/14 | Top-1 +0.27% |
Five architecture families. One codec. One pip install.
Companion Models
| Model | Architecture | Ratio | Eval Delta |
|---|---|---|---|
| clip-vit-large-patch14-helix | Vision+Text (CLIP) | 3.6x | +0.27% Top-1 |
| olmoe-1b-7b-instruct-helix | MoE (64 experts) | 1.9x | -0.16% HellaSwag |
| zamba2-2.7b-instruct-helix | Hybrid (Mamba2+Transformer) | 1.8x | +6.59% PPL |
| zamba2-1.2b-helix | Hybrid (Mamba2+Transformer) | 1.7x | +2.90% PPL |
| qwen2.5-14b-instruct-helix | Transformer | 3.4x | pending |
| qwen2.5-3b-instruct-helix | Transformer | 1.6x | +0.69% PPL |
| tinyllama-1.1b-helix | Transformer | 4.0x | +0.78% PPL |
| mamba2-1.3b-helix | Pure SSM (Mamba2) | 2.1x | +8.0% PPL |
| mamba-130m-helix | Pure SSM | 3.8x | +18.4% PPL |
Citation
@software{helix_substrate_2026,
title={Helix Substrate: Universal Weight Compression via HelixCode},
author={EchoLabs},
year={2026},
url={https://github.com/echo313unfolding/helix-substrate}
}
License
Apache 2.0 (inherited from openai/clip-vit-large-patch14).
- Downloads last month
- 6
Model tree for EchoLabs33/clip-vit-large-patch14-hxq
Base model
openai/clip-vit-large-patch14Evaluation results
- Top-1 Accuracy on CIFAR-100self-reported0.728
- Top-5 Accuracy on CIFAR-100self-reported0.916