pollux-1920 / README.md
alavicka's picture
Update README.md
5eea078 verified
|
Raw
History Blame Contribute Delete
6.83 kB
---
license: other
license_name: polyform-noncommercial-license-1.0.0
license_link: https://polyformproject.org/licenses/noncommercial/1.0.0/
pipeline_tag: text-generation
tags:
- pytorch
- custom_code
- quantization
- leech-lattice
- leech-lattice-quantization
- sub-1-bit
- 0.76-bit
---
# Pollux-1920 10k — Native H24 Leech-Lattice Language Model
**Pollux-1920** is a **991M-parameter** decoder-only causal transformer trained from scratch at **native 0.76-bit quantization resolution** (V = 50,688, n_embd = 1920). By mapping the parameter manifold natively onto the H24 Leech lattice, the **796M-parameter backbone** compresses to just **75.5 MB of active SRAM**.
This checkpoint represents the **structural convergence plateau** at **10,000 steps** (~2.6B tokens). All benchmark scores below are measured directly on the fully serialized **265 MB `.plx` deployment artifact**, confirming that the stated Iso-Memory footprints reflect true Edge AI deployment realities without statistical degradation.
At this peak, Pollux-1920 achieves **73.0% BLiMP** (fluid intelligence), matching the continuous Pythia-410M baseline (73.1% BLiMP) at the 4.2B-token Iso-Data boundary. It captures this identical syntactic ceiling despite a massive **87% reduction in active backbone SRAM** (75.5 MB vs. 577 MB).
This Hugging Face repository is a **weight-hosting layer only**. Pollux is **not** compatible with the Hugging Face `transformers` library. All inference, evaluation, packing, and tokenization logic lives in the official [Pollux GitHub codebase](https://github.com/alavicka/pollux).
---
## A Stateless Reasoning Engine for Zero-Interference RAG
Unlike conventional models that conflate fluid reasoning (syntax) with crystallised memory (factual trivia), Pollux acts as a purely structural engine. The **$C=\sqrt{2}$ Voronoi deep-hole barrier** acts as a geometric gradient coherence filter:
* **Fluid intelligence (structural):** Coherent, recurring gradient signals encoding invariant syntactic rules accumulate directed update momentum, cross the Voronoi barrier, and stabilize into $H_{24}$ kissing-point assignments.
* **Crystallised intelligence (factual):** High-entropy factual gradient signal lacks cross-batch directionality to cross the threshold and is absorbed by the zero-potential null attractor.
While the wider 1920-dimensional residual stream allows ubiquitous, high-frequency facts to initially leak through (reaching 60.7% SciQ — near or modestly above random chance, bounded by the high-frequency leakage mechanism), the lattice enters **Representational Stasis** at this checkpoint: BLiMP shifts by ≤ 0.5% and factual benchmarks shift by ≤ 1.0% over the subsequent 1.3B tokens. The model structurally stabilises and ceases to accumulate new factual associations — unlike Pythia-410M, which grows to 82.4% SciQ over extended training.
This empirically observed factual suppression is **not a defect**, but the defining feature for zero-interference Retrieval-Augmented Generation (RAG). By geometrically constraining parametric encoding, Pollux behaves as a **stateless reasoning engine**: it grounds its output in externally provided context, structurally reducing interference from internally stored parametric associations.
---
## Limitations & Hardware Constraints
The **0.76 bits/param** backbone footprint counts packed 18-bit indices plus one FP16 σ_rms per row. The **reference PyTorch runtime materialises these into dense FP16 weight matrices** at forward time for `cuBLAS` compatibility (~1.59 GB FP16 for the Pollux-1920 backbone alone, vs. ~75.5 MB packed). This is intentional for research reproducibility; **native LUT gather–accumulate kernels** are required to achieve SRAM-bound latency on edge devices.
---
## Files Included
| File | Description |
|---|---|
| **`pollux_1920_10k.plx`** | **Recommended for inference.** Pollux-1920 packed artifact — **75.5 MB backbone SRAM**, **265 MB total** on disk including INT8 embeddings and LM head. Empirically verified lossless. Load with `generate.py` or `evaluate.py`. |
| **`pollux_1920_10k.pt`** | **Training checkpoint** with continuous pre-weights in optimiser state; observable weights are dynamic Castor H24 projections. Use for inspecting pre-weights or reproducing the packing step. |
*(Note: Neither file can be consumed by `llama.cpp` or standard GGUF loaders without the custom runtime).*
---
## Evaluation Results
Evaluated with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). Pythia baseline: `EleutherAI/pythia-410m-deduped`.
> *(Note: The Iso-Memory criterion isolates memory-bandwidth footprint under the targeted native LUT runtime. Under the current FP16 reference materialisation, FLOPs per token scale with backbone parameter count and are not matched between Pollux-1920 and Pythia baselines.)*
| Task | Pollux-1920 @ 2.6B | Pythia-160M @ 4.2B (step 2k) | Pythia-410M @ 4.2B (step 2k) | Pythia-160M @ 300B (step 143k) | Pythia-410M @ 300B (step 143k) |
|---|---|---|---|---|---|
| **BLiMP mean (67 tasks)** | **73.0%** | 69.7% | 73.1% | 73.1% | 81.9% |
| **SciQ** | 60.7% | 58.7% | 57.2% | 72.3% | 82.4% |
| **HellaSwag** | 27.2% | 26.9% | 27.3% | 29.1% | 34.5% |
| **PIQA** | 59.8% | 58.4% | 58.2% | 61.9% | 67.2% |
| **Backbone SRAM** | **76 MB** | 162 MB | 577 MB | 162 MB | 577 MB |
| **Total on-disk footprint** | **265 MB** | 247 MB | 707 MB | 247 MB | 707 MB |
---
## Model Architecture Details
* **Architecture:** 18 layers · n_embd = 1920 · 80 heads · d_head = 24
* **Total parameters:** 991M (796M quantized backbone)
* **Training corpus:** FineWeb-Edu 10B subset
* **Token budget:** 10,000 optimizer steps (~2.6 billion tokens), executed across three sequential resumed runs with fully preserved optimizer state due to hardware interruptions; loss trajectories are stitched by training step.
* **Optimizer:** Endogenous kinetic optimiser (`pollux_step`) with no architectural hyperparameters; γ = G24 ≈ 0.065771. Requires one corpus-specific environmental input: `H_floor` — the irreducible cross-entropy convergence floor of the training corpus, measured from a continuous FP16 baseline.
## Licensing & Citation
Released under the **PolyForm Noncommercial License 1.0.0** for academic research.
Commercial utilization requires a license (pending WIPO Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026).
```bibtex
@misc{lavicka2026pollux,
title = {0.76 Bits Is All You Need: Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs},
author = {Lavicka, Alexander},
year = {2026},
note = {Preprint. WIPO Patent Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026},
url = {https://papers.ssrn.com/abstract=6973978}
---