pollux-1920 / README.md
alavicka's picture
Update README.md
5eea078 verified
|
Raw
History Blame Contribute Delete
6.83 kB
metadata
license: other
license_name: polyform-noncommercial-license-1.0.0
license_link: https://polyformproject.org/licenses/noncommercial/1.0.0/
pipeline_tag: text-generation
tags:
  - pytorch
  - custom_code
  - quantization
  - leech-lattice
  - leech-lattice-quantization
  - sub-1-bit
  - 0.76-bit

Pollux-1920 10k — Native H24 Leech-Lattice Language Model

Pollux-1920 is a 991M-parameter decoder-only causal transformer trained from scratch at native 0.76-bit quantization resolution (V = 50,688, n_embd = 1920). By mapping the parameter manifold natively onto the H24 Leech lattice, the 796M-parameter backbone compresses to just 75.5 MB of active SRAM.

This checkpoint represents the structural convergence plateau at 10,000 steps (~2.6B tokens). All benchmark scores below are measured directly on the fully serialized 265 MB .plx deployment artifact, confirming that the stated Iso-Memory footprints reflect true Edge AI deployment realities without statistical degradation.

At this peak, Pollux-1920 achieves 73.0% BLiMP (fluid intelligence), matching the continuous Pythia-410M baseline (73.1% BLiMP) at the 4.2B-token Iso-Data boundary. It captures this identical syntactic ceiling despite a massive 87% reduction in active backbone SRAM (75.5 MB vs. 577 MB).

This Hugging Face repository is a weight-hosting layer only. Pollux is not compatible with the Hugging Face transformers library. All inference, evaluation, packing, and tokenization logic lives in the official Pollux GitHub codebase.


A Stateless Reasoning Engine for Zero-Interference RAG

Unlike conventional models that conflate fluid reasoning (syntax) with crystallised memory (factual trivia), Pollux acts as a purely structural engine. The $C=\sqrt{2}$ Voronoi deep-hole barrier acts as a geometric gradient coherence filter:

  • Fluid intelligence (structural): Coherent, recurring gradient signals encoding invariant syntactic rules accumulate directed update momentum, cross the Voronoi barrier, and stabilize into $H_{24}$ kissing-point assignments.
  • Crystallised intelligence (factual): High-entropy factual gradient signal lacks cross-batch directionality to cross the threshold and is absorbed by the zero-potential null attractor.

While the wider 1920-dimensional residual stream allows ubiquitous, high-frequency facts to initially leak through (reaching 60.7% SciQ — near or modestly above random chance, bounded by the high-frequency leakage mechanism), the lattice enters Representational Stasis at this checkpoint: BLiMP shifts by ≤ 0.5% and factual benchmarks shift by ≤ 1.0% over the subsequent 1.3B tokens. The model structurally stabilises and ceases to accumulate new factual associations — unlike Pythia-410M, which grows to 82.4% SciQ over extended training.

This empirically observed factual suppression is not a defect, but the defining feature for zero-interference Retrieval-Augmented Generation (RAG). By geometrically constraining parametric encoding, Pollux behaves as a stateless reasoning engine: it grounds its output in externally provided context, structurally reducing interference from internally stored parametric associations.


Limitations & Hardware Constraints

The 0.76 bits/param backbone footprint counts packed 18-bit indices plus one FP16 σ_rms per row. The reference PyTorch runtime materialises these into dense FP16 weight matrices at forward time for cuBLAS compatibility (~1.59 GB FP16 for the Pollux-1920 backbone alone, vs. ~75.5 MB packed). This is intentional for research reproducibility; native LUT gather–accumulate kernels are required to achieve SRAM-bound latency on edge devices.


Files Included

File Description
pollux_1920_10k.plx Recommended for inference. Pollux-1920 packed artifact — 75.5 MB backbone SRAM, 265 MB total on disk including INT8 embeddings and LM head. Empirically verified lossless. Load with generate.py or evaluate.py.
pollux_1920_10k.pt Training checkpoint with continuous pre-weights in optimiser state; observable weights are dynamic Castor H24 projections. Use for inspecting pre-weights or reproducing the packing step.

(Note: Neither file can be consumed by llama.cpp or standard GGUF loaders without the custom runtime).


Evaluation Results

Evaluated with lm-evaluation-harness. Pythia baseline: EleutherAI/pythia-410m-deduped.

(Note: The Iso-Memory criterion isolates memory-bandwidth footprint under the targeted native LUT runtime. Under the current FP16 reference materialisation, FLOPs per token scale with backbone parameter count and are not matched between Pollux-1920 and Pythia baselines.)

Task Pollux-1920 @ 2.6B Pythia-160M @ 4.2B (step 2k) Pythia-410M @ 4.2B (step 2k) Pythia-160M @ 300B (step 143k) Pythia-410M @ 300B (step 143k)
BLiMP mean (67 tasks) 73.0% 69.7% 73.1% 73.1% 81.9%
SciQ 60.7% 58.7% 57.2% 72.3% 82.4%
HellaSwag 27.2% 26.9% 27.3% 29.1% 34.5%
PIQA 59.8% 58.4% 58.2% 61.9% 67.2%
Backbone SRAM 76 MB 162 MB 577 MB 162 MB 577 MB
Total on-disk footprint 265 MB 247 MB 707 MB 247 MB 707 MB

Model Architecture Details

  • Architecture: 18 layers · n_embd = 1920 · 80 heads · d_head = 24
  • Total parameters: 991M (796M quantized backbone)
  • Training corpus: FineWeb-Edu 10B subset
  • Token budget: 10,000 optimizer steps (~2.6 billion tokens), executed across three sequential resumed runs with fully preserved optimizer state due to hardware interruptions; loss trajectories are stitched by training step.
  • Optimizer: Endogenous kinetic optimiser (pollux_step) with no architectural hyperparameters; γ = G24 ≈ 0.065771. Requires one corpus-specific environmental input: H_floor — the irreducible cross-entropy convergence floor of the training corpus, measured from a continuous FP16 baseline.

Licensing & Citation

Released under the PolyForm Noncommercial License 1.0.0 for academic research. Commercial utilization requires a license (pending WIPO Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026).

@misc{lavicka2026pollux,
  title   = {0.76 Bits Is All You Need: Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs},
  author  = {Lavicka, Alexander},
  year    = {2026},
  note    = {Preprint. WIPO Patent Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026},
  url     = {https://papers.ssrn.com/abstract=6973978}

---