pollux-1920 / README.md

Update README.md

5eea078 verified 3 days ago

6.83 kB

	---
	license: other
	license_name: polyform-noncommercial-license-1.0.0
	license_link: https://polyformproject.org/licenses/noncommercial/1.0.0/

	pipeline_tag: text-generation
	tags:
	- pytorch
	- custom_code
	- quantization
	- leech-lattice
	- leech-lattice-quantization
	- sub-1-bit
	- 0.76-bit
	---

	# Pollux-1920 10k — Native H24 Leech-Lattice Language Model

	Pollux-1920 is a 991M-parameter decoder-only causal transformer trained from scratch at native 0.76-bit quantization resolution (V = 50,688, n_embd = 1920). By mapping the parameter manifold natively onto the H24 Leech lattice, the 796M-parameter backbone compresses to just 75.5 MB of active SRAM.

	This checkpoint represents the structural convergence plateau at 10,000 steps (~2.6B tokens). All benchmark scores below are measured directly on the fully serialized 265 MB `.plx` deployment artifact, confirming that the stated Iso-Memory footprints reflect true Edge AI deployment realities without statistical degradation.

	At this peak, Pollux-1920 achieves 73.0% BLiMP (fluid intelligence), matching the continuous Pythia-410M baseline (73.1% BLiMP) at the 4.2B-token Iso-Data boundary. It captures this identical syntactic ceiling despite a massive 87% reduction in active backbone SRAM (75.5 MB vs. 577 MB).

	This Hugging Face repository is a weight-hosting layer only. Pollux is not compatible with the Hugging Face `transformers` library. All inference, evaluation, packing, and tokenization logic lives in the official [Pollux GitHub codebase](https://github.com/alavicka/pollux).

	---

	## A Stateless Reasoning Engine for Zero-Interference RAG

	Unlike conventional models that conflate fluid reasoning (syntax) with crystallised memory (factual trivia), Pollux acts as a purely structural engine. The $C=\sqrt{2}$ Voronoi deep-hole barrier acts as a geometric gradient coherence filter:
	* Fluid intelligence (structural): Coherent, recurring gradient signals encoding invariant syntactic rules accumulate directed update momentum, cross the Voronoi barrier, and stabilize into $H_{24}$ kissing-point assignments.
	* Crystallised intelligence (factual): High-entropy factual gradient signal lacks cross-batch directionality to cross the threshold and is absorbed by the zero-potential null attractor.

	While the wider 1920-dimensional residual stream allows ubiquitous, high-frequency facts to initially leak through (reaching 60.7% SciQ — near or modestly above random chance, bounded by the high-frequency leakage mechanism), the lattice enters Representational Stasis at this checkpoint: BLiMP shifts by ≤ 0.5% and factual benchmarks shift by ≤ 1.0% over the subsequent 1.3B tokens. The model structurally stabilises and ceases to accumulate new factual associations — unlike Pythia-410M, which grows to 82.4% SciQ over extended training.

	This empirically observed factual suppression is not a defect, but the defining feature for zero-interference Retrieval-Augmented Generation (RAG). By geometrically constraining parametric encoding, Pollux behaves as a stateless reasoning engine: it grounds its output in externally provided context, structurally reducing interference from internally stored parametric associations.

	---

	## Limitations & Hardware Constraints

	The 0.76 bits/param backbone footprint counts packed 18-bit indices plus one FP16 σ_rms per row. The reference PyTorch runtime materialises these into dense FP16 weight matrices at forward time for `cuBLAS` compatibility (~1.59 GB FP16 for the Pollux-1920 backbone alone, vs. ~75.5 MB packed). This is intentional for research reproducibility; native LUT gather–accumulate kernels are required to achieve SRAM-bound latency on edge devices.

	---

	## Files Included

	\| File \| Description \|
	\|---\|---\|
	\| `pollux_1920_10k.plx` \| Recommended for inference. Pollux-1920 packed artifact — 75.5 MB backbone SRAM, 265 MB total on disk including INT8 embeddings and LM head. Empirically verified lossless. Load with `generate.py` or `evaluate.py`. \|
	\| `pollux_1920_10k.pt` \| Training checkpoint with continuous pre-weights in optimiser state; observable weights are dynamic Castor H24 projections. Use for inspecting pre-weights or reproducing the packing step. \|

	(Note: Neither file can be consumed by `llama.cpp` or standard GGUF loaders without the custom runtime).

	---

	## Evaluation Results

	Evaluated with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). Pythia baseline: `EleutherAI/pythia-410m-deduped`.

	> (Note: The Iso-Memory criterion isolates memory-bandwidth footprint under the targeted native LUT runtime. Under the current FP16 reference materialisation, FLOPs per token scale with backbone parameter count and are not matched between Pollux-1920 and Pythia baselines.)

	\| Task \| Pollux-1920 @ 2.6B \| Pythia-160M @ 4.2B (step 2k) \| Pythia-410M @ 4.2B (step 2k) \| Pythia-160M @ 300B (step 143k) \| Pythia-410M @ 300B (step 143k) \|
	\|---\|---\|---\|---\|---\|---\|
	\| BLiMP mean (67 tasks) \| 73.0% \| 69.7% \| 73.1% \| 73.1% \| 81.9% \|
	\| SciQ \| 60.7% \| 58.7% \| 57.2% \| 72.3% \| 82.4% \|
	\| HellaSwag \| 27.2% \| 26.9% \| 27.3% \| 29.1% \| 34.5% \|
	\| PIQA \| 59.8% \| 58.4% \| 58.2% \| 61.9% \| 67.2% \|
	\| Backbone SRAM \| 76 MB \| 162 MB \| 577 MB \| 162 MB \| 577 MB \|
	\| Total on-disk footprint \| 265 MB \| 247 MB \| 707 MB \| 247 MB \| 707 MB \|

	---

	## Model Architecture Details
	* Architecture: 18 layers · n_embd = 1920 · 80 heads · d_head = 24
	* Total parameters: 991M (796M quantized backbone)
	* Training corpus: FineWeb-Edu 10B subset
	* Token budget: 10,000 optimizer steps (~2.6 billion tokens), executed across three sequential resumed runs with fully preserved optimizer state due to hardware interruptions; loss trajectories are stitched by training step.
	* Optimizer: Endogenous kinetic optimiser (`pollux_step`) with no architectural hyperparameters; γ = G24 ≈ 0.065771. Requires one corpus-specific environmental input: `H_floor` — the irreducible cross-entropy convergence floor of the training corpus, measured from a continuous FP16 baseline.

	## Licensing & Citation
	Released under the PolyForm Noncommercial License 1.0.0 for academic research.
	Commercial utilization requires a license (pending WIPO Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026).

	```bibtex
	@misc{lavicka2026pollux,
	title = {0.76 Bits Is All You Need: Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs},
	author = {Lavicka, Alexander},
	year = {2026},
	note = {Preprint. WIPO Patent Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026},
	url = {https://papers.ssrn.com/abstract=6973978}

	---