Pollux-1920 10k — Native H24 Leech-Lattice Language Model

Pollux-1920 is a 991M-parameter decoder-only causal transformer trained from scratch at native 0.76-bit quantization resolution (V = 50,688, n_embd = 1920). By mapping the parameter manifold natively onto the H24 Leech lattice, the 796M-parameter backbone compresses to just 76 MB of active SRAM. For the complete architectural and mathematical breakdown, read the official paper: 0.76 Bits Is All You Need: Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs.

This checkpoint represents the thermodynamic crystallisation peak at 10,000 steps (~2.6B tokens). All benchmark scores below are measured directly on the fully serialized 265 MB .plx deployment artifact, confirming that the stated Iso-Memory footprints reflect true Edge AI deployment realities without statistical degradation.

At this peak, Pollux-1920 achieves 73.0% BLiMP (fluid intelligence), matching the continuous Pythia-410M baseline (73.1% BLiMP) at the 4.2B-token Iso-Data boundary. It captures this identical syntactic ceiling despite a massive 87% reduction in active backbone SRAM (76 MB vs. 577 MB).

This Hugging Face repository is a weight-hosting layer only. Pollux is not compatible with the Hugging Face transformers library. All inference, evaluation, packing, and tokenization logic lives in the official Pollux GitHub codebase.


The "Stateless CPU" Property — Zero-Interference RAG

Unlike conventional models that conflate fluid reasoning (syntax) with crystallised memory (factual trivia), Pollux acts as a purely structural engine. The 0.76-bit global Voronoi bottleneck acts as a mathematically pure high-pass filter:

  • Fluid intelligence (structural): Gradient signals encoding invariant syntactic rules crystallise into stable kissing-point assignments.
  • Crystallised intelligence (factual): High-entropy factual associations are mechanically attenuated and routed into the zero-potential null attractor.

While the wider 1920-dimensional residual stream allows ubiquitous, high-frequency facts to initially leak through (reaching 60.7% SciQ — near or modestly above random chance, bounded by the high-frequency leakage mechanism), the lattice enters thermodynamic stasis at this checkpoint (the "Deep Freeze"): BLiMP shifts by ≤ 0.5% and factual benchmarks shift by ≤ 1.0% over the subsequent 1.3B tokens. The model structurally stabilises and refuses to act as an unbound factual sponge — unlike Pythia-410M, which balloons to 82.4% SciQ over extended training.

This mechanical factual restriction is not a defect, but the defining feature for zero-interference Retrieval-Augmented Generation (RAG). By thermodynamically formatting parametric memory, Pollux acts as a stateless cognitive CPU: it parses and manipulates external factual databases without internal parametric bias or hallucination.


Hardware & Inference Limitations

The 0.76 bits/param backbone footprint counts packed 18-bit indices plus one FP16 σ_rms per row. The reference PyTorch runtime materialises these into dense FP16 weight matrices at forward time for cuBLAS compatibility (~1.59 GB FP16 for the Pollux-1920 backbone alone, vs. ~76 MB packed). This is intentional for research reproducibility; native LUT gather–accumulate kernels are required to achieve SRAM-bound latency on edge devices.


Files Included

File Description
pollux_1920_10k.plx Recommended for inference. Pollux-1920 packed artifact — 76 MB backbone SRAM, 265 MB total on disk including INT8 embeddings and LM head. Empirically verified lossless. Load with generate.py or evaluate.py.
pollux_1920_10k.pt Training checkpoint with continuous pre-weights in optimiser state; observable weights are dynamic Castor H24 projections. Use for inspecting pre-weights or reproducing the packing step.

Note on File Size: The 265 MB footprint matches the formal paper and GitHub documentation, which use binary Megabytes (MiB) standard to most operating systems. The Hugging Face UI displays this identical file using decimal SI units (~278 MB).

(Note: Neither file can be consumed by llama.cpp or standard GGUF loaders without the custom runtime).


Evaluation Results

Evaluated with lm-evaluation-harness. Pythia baseline: EleutherAI/pythia-410m-deduped.

(Note: The Iso-Memory criterion isolates memory-bandwidth footprint under the targeted native LUT runtime. Under the current FP16 reference materialisation, FLOPs per token scale with backbone parameter count and are not matched between Pollux-1920 and Pythia baselines.)

Task Pollux-1920 @ 2.6B Pythia-160M @ 4.2B (step 2k) Pythia-410M @ 4.2B (step 2k) Pythia-160M @ 300B (step 143k) Pythia-410M @ 300B (step 143k)
BLiMP mean (67 tasks) 73.0% 69.7% 73.1% 73.1% 81.9%
SciQ 60.7% 58.7% 57.2% 72.3% 82.4%
HellaSwag 27.2% 26.9% 27.3% 29.1% 34.5%
PIQA 59.8% 58.4% 58.2% 61.9% 67.2%
Backbone SRAM 76 MB 162 MB 577 MB 162 MB 577 MB
Total on-disk footprint 265 MB 247 MB 707 MB 247 MB 707 MB

Model Architecture Details

  • Architecture: 18 layers · n_embd = 1920 · 80 heads · d_head = 24
  • Total parameters: 991M (796M quantized backbone)
  • Training corpus: FineWeb-Edu 10B subset
  • Token budget: 10,000 optimizer steps (~2.6 billion tokens), executed across three sequential resumed runs with fully preserved optimizer state due to hardware interruptions; loss trajectories are stitched by training step.
  • Optimizer: Thermodynamic estimator (pollux_step) with no architectural hyperparameters; γ = G24 ≈ 0.065771. Requires one corpus-specific environmental input: H_floor (the measured noise floor of the training corpus, analogous to ambient temperature in Carnot theory).

Licensing & Citation

Released under the PolyForm Noncommercial License 1.0.0 for academic research. Commercial utilization requires a license (pending WIPO Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026).

@misc{lavicka2026pollux,
  title   = {0.76 Bits Is All You Need: Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs},
  author  = {Lavicka, Alexander},
  year    = {2026},
  note    = {Preprint. WIPO Patent Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026},
  url     = {https://papers.ssrn.com/abstract=6973978}

---
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support