Pollux-1152 10k — Native H24 Leech-Lattice Language Model

Pollux-1152 is a 404M-parameter decoder-only causal transformer trained from scratch at native 0.76-bit quantization resolution (V = 50,688, n_embd = 1152). By mapping the parameter manifold natively onto the H24 Leech lattice, the 287M-parameter backbone compresses to just 27 MB of active SRAM. For the complete architectural and mathematical breakdown, read the official paper: 0.76 Bits Is All You Need: Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs.

This checkpoint represents the thermodynamic crystallisation peak at 10,000 steps (~2.6B tokens). All benchmark scores below are measured directly on the fully serialized 142 MB .plx deployment artifact, confirming that the stated Iso-Memory footprints reflect true Edge AI deployment realities without statistical degradation.

At this peak, Pollux-1152 achieves 69.9% BLiMP (fluid intelligence), outperforming the continuous Pythia-160M baseline (69.7% BLiMP) at the 4.2B-token Iso-Data boundary. It achieves this syntactic parity despite an 83% reduction in active backbone SRAM (27 MB vs. 162 MB).

This Hugging Face repository is a weight-hosting layer only. Pollux is not compatible with the Hugging Face transformers library. All inference, evaluation, packing, and tokenization logic lives in the official Pollux GitHub codebase.


The "Stateless CPU" Property — Zero-Interference RAG

Unlike conventional models that conflate fluid reasoning (syntax) with crystallised memory (factual trivia), Pollux acts as a purely structural engine. The 0.76-bit global Voronoi bottleneck acts as a mathematically pure high-pass filter:

  • Fluid intelligence (structural): Gradient signals encoding invariant syntactic rules crystallise into stable kissing-point assignments.
  • Crystallised intelligence (factual): High-entropy factual associations are mechanically attenuated and routed into the zero-potential null attractor.

The resulting near or modestly above random chance performance on factual benchmarks (e.g., 50.3% SciQ vs. random-chance ≈ 25%) is bounded by high-frequency leakage for ubiquitous facts, and is not a defect but the defining feature for zero-interference Retrieval-Augmented Generation (RAG). By thermodynamically formatting parametric memory, Pollux acts as a stateless cognitive CPU: it parses and manipulates external factual databases without internal parametric bias or hallucination.


Hardware & Inference Limitations

The 0.76 bits/param backbone footprint counts packed 18-bit indices plus one FP16 σ_rms per row. The reference PyTorch runtime materialises these into dense FP16 weight matrices at forward time for cuBLAS compatibility (~574 MB FP16 for the Pollux-1152 backbone alone, vs. ~27 MB packed). This is intentional for research reproducibility; native LUT gather–accumulate kernels are required to achieve SRAM-bound latency on edge devices.


Files Included

File Description
pollux_1152_10k.plx Recommended for inference. Pollux-1152 packed artifact — 27.3 MB backbone SRAM, 142 MB total on disk including INT8 embeddings and LM head. Empirically verified lossless. Load with generate.py or evaluate.py.
pollux_1152_10k.pt Training checkpoint with continuous pre-weights in optimiser state; observable weights are dynamic Castor H24 projections. Use for inspecting pre-weights or reproducing the packing step.

Note on File Size: The 142 MB footprint matches the formal paper and GitHub documentation, which use binary Megabytes (MiB) standard to most operating systems. The Hugging Face UI displays this identical file using decimal SI units (~149 MB).

(Note: Neither file can be consumed by llama.cpp or standard GGUF loaders without the custom runtime).


Evaluation Results

Evaluated with lm-evaluation-harness. Pythia baseline: EleutherAI/pythia-160m-deduped.

(Note: The Iso-Memory criterion isolates memory-bandwidth footprint under the targeted native LUT runtime. Under the current FP16 reference materialisation, FLOPs per token scale with backbone parameter count and are not matched between Pollux-1152 and Pythia baselines.)

Task Pollux-1152 @ 2.6B Pythia-160M @ 4.2B (step 2k) Pythia-160M @ 300B (step 143k)
BLiMP mean (67 tasks) 69.9% 69.7% 73.1%
SciQ 50.3% 58.7% 72.3%
HellaSwag 26.4% 26.9% 29.1%
PIQA 57.7% 58.4% 61.9%
Backbone SRAM 27 MB 162 MB 162 MB
Total on-disk footprint 142 MB 247 MB 247 MB

Model Architecture Details

  • Architecture: 18 layers · n_embd = 1152 · 48 heads · d_head = 24
  • Training corpus: FineWeb-Edu 10B subset
  • Token budget: 10,000 optimizer steps (~2.6 billion tokens)
  • Optimizer: Thermodynamic estimator (pollux_step) with no architectural hyperparameters; γ = G24 ≈ 0.065771. Requires one corpus-specific environmental input: H_floor (the measured noise floor of the training corpus, analogous to ambient temperature in Carnot theory).

Licensing & Citation

Released under the PolyForm Noncommercial License 1.0.0 for academic research. Commercial utilization requires a license (pending WIPO Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026).

@misc{lavicka2026pollux,
  title   = {0.76 Bits Is All You Need: Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs},
  author  = {Lavicka, Alexander},
  year    = {2026},
  note    = {Preprint. WIPO Patent Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026},
  url     = {https://papers.ssrn.com/abstract=6973978}

---
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support