BERT-base-uncased-HXQ
4.0x smaller from FP32. MLM accuracy 61.0%. First encoder-only model compressed with HXQ.
BERT-base-uncased compressed from 421 MB to 128 MB. Masked language modeling accuracy matches the dense baseline. Same codec that compresses Transformers, SSMs, Hybrids, MoEs, and vision models.
Install and Run
pip install "helix-substrate[hf]"
import helix_substrate
from transformers import BertForMaskedLM, BertTokenizer
model = BertForMaskedLM.from_pretrained("EchoLabs33/bert-base-uncased-hxq")
tokenizer = BertTokenizer.from_pretrained("EchoLabs33/bert-base-uncased-hxq")
inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
outputs = model(**inputs)
pred = outputs.logits[0, 5].argmax()
print(tokenizer.decode(pred)) # paris
Downstream Benchmarks
Masked language modeling on WikiText-2 (500 randomly masked tokens):
| Metric | Dense | HXQ (4.0x) | Delta |
|---|---|---|---|
| MLM Top-1 | 61.40% | 61.00% | -0.40% |
| MLM Top-5 | 77.60% | 77.00% | -0.60% |
Deltas within sampling noise. Task performance preserved after 4.0x compression.
Compression Benchmark
| Dense (FP32) | HXQ | |
|---|---|---|
| Size | 421 MB | 128 MB |
| Compression ratio | -- | 4.0x |
| VRAM (eval) | 824 MB | 584 MB |
| Compressed modules | -- | 75 HelixLinear layers |
| Architecture | BERT (encoder-only Transformer) | unchanged |
Verification Status
- Compression receipt: PASS -- 75 compressed, cos 0.999+
- Conversion receipt: PASS (Gate 1 + Gate 2)
- Downstream eval: PASS -- paired dense/HXQ on WikiText-2 MLM
Architecture Details
BERT-base-uncased is an encoder-only Transformer:
- 12 layers, hidden_size=768, 12 attention heads
- 110M parameters
- Trained on masked language modeling + next sentence prediction
All 75 linear layers (attention Q/K/V/O, MLP intermediate/output, pooler, classification head) are compressed. Embedding layers (word, position, token_type), layer norms, and biases are stored at full precision.
Why This Matters
BERT is the first encoder-only model compressed with HXQ. The same codec now covers:
| Family | Models |
|---|---|
| Decoder-only Transformer | TinyLlama, Qwen 1.5B-14B |
| Pure SSM | Mamba 130m, Mamba2 1.3B |
| Hybrid (SSM+Transformer) | Zamba2 1.2B, 2.7B |
| MoE | OLMoE 1B/7B |
| Vision+Text | CLIP ViT-L/14 |
| Encoder-only | BERT-base |
Six architecture families. One codec. One pip install.
Citation
@software{helix_substrate_2026,
title={Helix Substrate: Universal Weight Compression via HelixCode},
author={EchoLabs},
year={2026},
url={https://github.com/echo313unfolding/helix-substrate}
}
License
Apache 2.0 (inherited from google-bert/bert-base-uncased).
- Downloads last month
- 15
Model tree for EchoLabs33/bert-base-uncased-hxq
Base model
google-bert/bert-base-uncasedEvaluation results
- MLM Top-1 Accuracy on WikiText-2self-reported0.610