BERT-base-uncased-HXQ

4.0x smaller from FP32. MLM accuracy 61.0%. First encoder-only model compressed with HXQ.

BERT-base-uncased compressed from 421 MB to 128 MB. Masked language modeling accuracy matches the dense baseline. Same codec that compresses Transformers, SSMs, Hybrids, MoEs, and vision models.

Install and Run

pip install "helix-substrate[hf]"
import helix_substrate
from transformers import BertForMaskedLM, BertTokenizer

model = BertForMaskedLM.from_pretrained("EchoLabs33/bert-base-uncased-hxq")
tokenizer = BertTokenizer.from_pretrained("EchoLabs33/bert-base-uncased-hxq")

inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")
outputs = model(**inputs)
pred = outputs.logits[0, 5].argmax()
print(tokenizer.decode(pred))  # paris

Downstream Benchmarks

Masked language modeling on WikiText-2 (500 randomly masked tokens):

Metric Dense HXQ (4.0x) Delta
MLM Top-1 61.40% 61.00% -0.40%
MLM Top-5 77.60% 77.00% -0.60%

Deltas within sampling noise. Task performance preserved after 4.0x compression.

Compression Benchmark

Dense (FP32) HXQ
Size 421 MB 128 MB
Compression ratio -- 4.0x
VRAM (eval) 824 MB 584 MB
Compressed modules -- 75 HelixLinear layers
Architecture BERT (encoder-only Transformer) unchanged

Verification Status

  • Compression receipt: PASS -- 75 compressed, cos 0.999+
  • Conversion receipt: PASS (Gate 1 + Gate 2)
  • Downstream eval: PASS -- paired dense/HXQ on WikiText-2 MLM

Architecture Details

BERT-base-uncased is an encoder-only Transformer:

  • 12 layers, hidden_size=768, 12 attention heads
  • 110M parameters
  • Trained on masked language modeling + next sentence prediction

All 75 linear layers (attention Q/K/V/O, MLP intermediate/output, pooler, classification head) are compressed. Embedding layers (word, position, token_type), layer norms, and biases are stored at full precision.

Why This Matters

BERT is the first encoder-only model compressed with HXQ. The same codec now covers:

Family Models
Decoder-only Transformer TinyLlama, Qwen 1.5B-14B
Pure SSM Mamba 130m, Mamba2 1.3B
Hybrid (SSM+Transformer) Zamba2 1.2B, 2.7B
MoE OLMoE 1B/7B
Vision+Text CLIP ViT-L/14
Encoder-only BERT-base

Six architecture families. One codec. One pip install.

Citation

@software{helix_substrate_2026,
  title={Helix Substrate: Universal Weight Compression via HelixCode},
  author={EchoLabs},
  year={2026},
  url={https://github.com/echo313unfolding/helix-substrate}
}

License

Apache 2.0 (inherited from google-bert/bert-base-uncased).

Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EchoLabs33/bert-base-uncased-hxq

Quantized
(24)
this model

Evaluation results