pollux-1920 / README.md

Update README.md

5eea078 verified 2 days ago

6.83 kB

license: other
license_name: polyform-noncommercial-license-1.0.0
license_link: https://polyformproject.org/licenses/noncommercial/1.0.0/
pipeline_tag: text-generation
tags:
  - pytorch
  - custom_code
  - quantization
  - leech-lattice
  - leech-lattice-quantization
  - sub-1-bit
  - 0.76-bit

Pollux-1920 10k — Native H24 Leech-Lattice Language Model

Pollux-1920 is a 991M-parameter decoder-only causal transformer trained from scratch at native 0.76-bit quantization resolution (V = 50,688, n_embd = 1920). By mapping the parameter manifold natively onto the H24 Leech lattice, the 796M-parameter backbone compresses to just 75.5 MB of active SRAM.

This checkpoint represents the structural convergence plateau at 10,000 steps (~2.6B tokens). All benchmark scores below are measured directly on the fully serialized 265 MB .plx deployment artifact, confirming that the stated Iso-Memory footprints reflect true Edge AI deployment realities without statistical degradation.

At this peak, Pollux-1920 achieves 73.0% BLiMP (fluid intelligence), matching the continuous Pythia-410M baseline (73.1% BLiMP) at the 4.2B-token Iso-Data boundary. It captures this identical syntactic ceiling despite a massive 87% reduction in active backbone SRAM (75.5 MB vs. 577 MB).

This Hugging Face repository is a weight-hosting layer only. Pollux is not compatible with the Hugging Face transformers library. All inference, evaluation, packing, and tokenization logic lives in the official Pollux GitHub codebase.

A Stateless Reasoning Engine for Zero-Interference RAG

Unlike conventional models that conflate fluid reasoning (syntax) with crystallised memory (factual trivia), Pollux acts as a purely structural engine. The $C=\sqrt{2}$ Voronoi deep-hole barrier acts as a geometric gradient coherence filter:

Fluid intelligence (structural): Coherent, recurring gradient signals encoding invariant syntactic rules accumulate directed update momentum, cross the Voronoi barrier, and stabilize into $H_{24}$ kissing-point assignments.
Crystallised intelligence (factual): High-entropy factual gradient signal lacks cross-batch directionality to cross the threshold and is absorbed by the zero-potential null attractor.

While the wider 1920-dimensional residual stream allows ubiquitous, high-frequency facts to initially leak through (reaching 60.7% SciQ — near or modestly above random chance, bounded by the high-frequency leakage mechanism), the lattice enters Representational Stasis at this checkpoint: BLiMP shifts by ≤ 0.5% and factual benchmarks shift by ≤ 1.0% over the subsequent 1.3B tokens. The model structurally stabilises and ceases to accumulate new factual associations — unlike Pythia-410M, which grows to 82.4% SciQ over extended training.

This empirically observed factual suppression is not a defect, but the defining feature for zero-interference Retrieval-Augmented Generation (RAG). By geometrically constraining parametric encoding, Pollux behaves as a stateless reasoning engine: it grounds its output in externally provided context, structurally reducing interference from internally stored parametric associations.

Limitations & Hardware Constraints

The 0.76 bits/param backbone footprint counts packed 18-bit indices plus one FP16 σ_rms per row. The reference PyTorch runtime materialises these into dense FP16 weight matrices at forward time for cuBLAS compatibility (~1.59 GB FP16 for the Pollux-1920 backbone alone, vs. ~75.5 MB packed). This is intentional for research reproducibility; native LUT gather–accumulate kernels are required to achieve SRAM-bound latency on edge devices.

Files Included

File	Description
`pollux_1920_10k.plx`	Recommended for inference. Pollux-1920 packed artifact — 75.5 MB backbone SRAM, 265 MB total on disk including INT8 embeddings and LM head. Empirically verified lossless. Load with `generate.py` or `evaluate.py`.
`pollux_1920_10k.pt`	Training checkpoint with continuous pre-weights in optimiser state; observable weights are dynamic Castor H24 projections. Use for inspecting pre-weights or reproducing the packing step.

(Note: Neither file can be consumed by llama.cpp or standard GGUF loaders without the custom runtime).

Evaluation Results

Evaluated with lm-evaluation-harness. Pythia baseline: EleutherAI/pythia-410m-deduped.

(Note: The Iso-Memory criterion isolates memory-bandwidth footprint under the targeted native LUT runtime. Under the current FP16 reference materialisation, FLOPs per token scale with backbone parameter count and are not matched between Pollux-1920 and Pythia baselines.)

Task	Pollux-1920 @ 2.6B	Pythia-160M @ 4.2B (step 2k)	Pythia-410M @ 4.2B (step 2k)	Pythia-160M @ 300B (step 143k)	Pythia-410M @ 300B (step 143k)
BLiMP mean (67 tasks)	73.0%	69.7%	73.1%	73.1%	81.9%
SciQ	60.7%	58.7%	57.2%	72.3%	82.4%
HellaSwag	27.2%	26.9%	27.3%	29.1%	34.5%
PIQA	59.8%	58.4%	58.2%	61.9%	67.2%
Backbone SRAM	76 MB	162 MB	577 MB	162 MB	577 MB
Total on-disk footprint	265 MB	247 MB	707 MB	247 MB	707 MB

Model Architecture Details

Architecture: 18 layers · n_embd = 1920 · 80 heads · d_head = 24
Total parameters: 991M (796M quantized backbone)
Training corpus: FineWeb-Edu 10B subset
Token budget: 10,000 optimizer steps (~2.6 billion tokens), executed across three sequential resumed runs with fully preserved optimizer state due to hardware interruptions; loss trajectories are stitched by training step.
Optimizer: Endogenous kinetic optimiser (pollux_step) with no architectural hyperparameters; γ = G24 ≈ 0.065771. Requires one corpus-specific environmental input: H_floor — the irreducible cross-entropy convergence floor of the training corpus, measured from a continuous FP16 baseline.

Licensing & Citation

Released under the PolyForm Noncommercial License 1.0.0 for academic research. Commercial utilization requires a license (pending WIPO Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026).

@misc{lavicka2026pollux,
  title   = {0.76 Bits Is All You Need: Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs},
  author  = {Lavicka, Alexander},
  year    = {2026},
  note    = {Preprint. WIPO Patent Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026},
  url     = {https://papers.ssrn.com/abstract=6973978}

---