CLIP-ViT-L/14-HXQ

3.6x smaller from FP32. CIFAR-100 Top-1 72.8%. First vision model compressed with HXQ.

CLIP ViT-Large/14 (text + vision dual encoder) compressed from 1.6 GB to 447 MB. Zero-shot classification accuracy matches the dense baseline. No calibration data. Same codec that compresses Transformers, SSMs, Hybrids, and MoEs.

Install and Run

pip install "helix-substrate[hf]"

import helix_substrate  # registers the HXQ quantizer with HuggingFace
from transformers import CLIPModel, CLIPProcessor
from PIL import Image

model = CLIPModel.from_pretrained("EchoLabs33/clip-vit-large-patch14-helix")
processor = CLIPProcessor.from_pretrained("EchoLabs33/clip-vit-large-patch14-helix")

image = Image.open("photo.jpg")
inputs = processor(
    text=["a photo of a cat", "a photo of a dog", "a photo of a car"],
    images=image,
    return_tensors="pt",
    padding=True,
)
outputs = model(**inputs)
probs = outputs.logits_per_image.softmax(dim=-1)
print(probs)  # [cat_prob, dog_prob, car_prob]

Downstream Benchmarks

Zero-shot CIFAR-100 classification (10,000 test images, 100 classes, prompt: "a photo of a {class}"):

Metric	Dense	HXQ (3.6x)	Delta
Top-1 Accuracy	72.48%	72.75%	+0.27%
Top-5 Accuracy	91.41%	91.64%	+0.23%

All deltas within noise. Task performance preserved after 3.6x compression.

Compression Benchmark

	Dense (FP32)	HXQ
Size	1.6 GB	447 MB
Compression ratio	--	3.6x
VRAM (eval)	3,412 MB	2,266 MB
Compressed modules	--	218 HelixLinear layers
Architecture	CLIP (ViT-L/14 + Text Transformer)	unchanged

Verification Status

Compression receipt: PASS -- 218 compressed, 374 exact, mean cosine 0.9997
Conversion receipt: PASS (Gate 1 + Gate 2)
Downstream eval: PASS -- paired dense/HXQ on CIFAR-100 zero-shot

Good to Know

GPU and CPU supported -- runs on any CUDA GPU or CPU.
Fine-tunable via LoRA — compressed weights remain frozen, but LoRA adapters attach to each HelixLinear layer via HelixLinearSTE. See helix-substrate for training infrastructure.
Requires helix-substrate -- you need pip install "helix-substrate[hf]".
Embeddings stored exact -- token, position, and patch embeddings are at full precision. Only the 218 attention + MLP linear layers are compressed.

What is HelixCode?

HelixCode is a universal weight compression codec based on vector quantization:

Each weight matrix is replaced by a 256-entry codebook (float32) + uint8 index matrix + optional sidecar corrections for outlier values
The compressed form is the executable -- no decompression step
Works on any nn.Linear regardless of architecture
No calibration data required -- codebooks are fit from the weights alone

Architecture Details

CLIP ViT-Large/14 is a dual-encoder multimodal model:

Vision encoder: 24-layer ViT-Large, hidden_size=1024, 16 attention heads, patch_size=14
Text encoder: 12-layer Transformer, hidden_size=768, 12 attention heads
Cross-modal projections: visual_projection (1024->768) + text_projection (768->768)

All 218 linear layers across both encoders are compressed. Embedding layers (token, position, patch), layer norms, and biases are stored at full precision.

Why This Matters

CLIP is the first vision model compressed with HXQ. The same codec now covers:

Family	Models	Eval
Transformer	TinyLlama, Qwen 1.5B-14B	PPL within noise
Pure SSM	Mamba 130m, Mamba2 1.3B	PPL receipted
Hybrid	Zamba2 1.2B, 2.7B	PPL receipted
MoE	OLMoE 1B/7B	HellaSwag -0.16%
Vision+Text	CLIP ViT-L/14	Top-1 +0.27%

Five architecture families. One codec. One pip install.

Companion Models

Model	Architecture	Ratio	Eval Delta
clip-vit-large-patch14-helix	Vision+Text (CLIP)	3.6x	+0.27% Top-1
olmoe-1b-7b-instruct-helix	MoE (64 experts)	1.9x	-0.16% HellaSwag
zamba2-2.7b-instruct-helix	Hybrid (Mamba2+Transformer)	1.8x	+6.59% PPL
zamba2-1.2b-helix	Hybrid (Mamba2+Transformer)	1.7x	+2.90% PPL
qwen2.5-14b-instruct-helix	Transformer	3.4x	pending
qwen2.5-3b-instruct-helix	Transformer	1.6x	+0.69% PPL
tinyllama-1.1b-helix	Transformer	4.0x	+0.78% PPL
mamba2-1.3b-helix	Pure SSM (Mamba2)	2.1x	+8.0% PPL
mamba-130m-helix	Pure SSM	3.8x	+18.4% PPL

Citation

@software{helix_substrate_2026,
  title={Helix Substrate: Universal Weight Compression via HelixCode},
  author={EchoLabs},
  year={2026},
  url={https://github.com/echo313unfolding/helix-substrate}
}

License

Apache 2.0 (inherited from openai/clip-vit-large-patch14).

Downloads last month: 2

Safetensors

Model size

0.4B params

Tensor type

I64

F32

F16

Model tree for EchoLabs33/clip-vit-large-patch14-hxq

Base model

openai/clip-vit-large-patch14

Quantized

(7)

this model

Evaluation results

Top-1 Accuracy on CIFAR-100
self-reported

0.728
Top-5 Accuracy on CIFAR-100
self-reported

0.916