| --- |
| license: other |
| license_name: polyform-noncommercial-license-1.0.0 |
| license_link: https://polyformproject.org/licenses/noncommercial/1.0.0/ |
|
|
| pipeline_tag: text-generation |
| tags: |
| - pytorch |
| - custom_code |
| - quantization |
| - leech-lattice |
| - leech-lattice-quantization |
| - sub-1-bit |
| - 0.76-bit |
| --- |
| |
| # Pollux-1920 10k — Native H24 Leech-Lattice Language Model |
|
|
| **Pollux-1920** is a **991M-parameter** decoder-only causal transformer trained from scratch at **native 0.76-bit quantization resolution** (V = 50,688, n_embd = 1920). By mapping the parameter manifold natively onto the H24 Leech lattice, the **796M-parameter backbone** compresses to just **75.5 MB of active SRAM**. |
| |
| This checkpoint represents the **structural convergence plateau** at **10,000 steps** (~2.6B tokens). All benchmark scores below are measured directly on the fully serialized **265 MB `.plx` deployment artifact**, confirming that the stated Iso-Memory footprints reflect true Edge AI deployment realities without statistical degradation. |
| |
| At this peak, Pollux-1920 achieves **73.0% BLiMP** (fluid intelligence), matching the continuous Pythia-410M baseline (73.1% BLiMP) at the 4.2B-token Iso-Data boundary. It captures this identical syntactic ceiling despite a massive **87% reduction in active backbone SRAM** (75.5 MB vs. 577 MB). |
| |
| This Hugging Face repository is a **weight-hosting layer only**. Pollux is **not** compatible with the Hugging Face `transformers` library. All inference, evaluation, packing, and tokenization logic lives in the official [Pollux GitHub codebase](https://github.com/alavicka/pollux). |
| |
| --- |
| |
| ## A Stateless Reasoning Engine for Zero-Interference RAG |
| |
| Unlike conventional models that conflate fluid reasoning (syntax) with crystallised memory (factual trivia), Pollux acts as a purely structural engine. The **$C=\sqrt{2}$ Voronoi deep-hole barrier** acts as a geometric gradient coherence filter: |
| * **Fluid intelligence (structural):** Coherent, recurring gradient signals encoding invariant syntactic rules accumulate directed update momentum, cross the Voronoi barrier, and stabilize into $H_{24}$ kissing-point assignments. |
| * **Crystallised intelligence (factual):** High-entropy factual gradient signal lacks cross-batch directionality to cross the threshold and is absorbed by the zero-potential null attractor. |
|
|
| While the wider 1920-dimensional residual stream allows ubiquitous, high-frequency facts to initially leak through (reaching 60.7% SciQ — near or modestly above random chance, bounded by the high-frequency leakage mechanism), the lattice enters **Representational Stasis** at this checkpoint: BLiMP shifts by ≤ 0.5% and factual benchmarks shift by ≤ 1.0% over the subsequent 1.3B tokens. The model structurally stabilises and ceases to accumulate new factual associations — unlike Pythia-410M, which grows to 82.4% SciQ over extended training. |
|
|
| This empirically observed factual suppression is **not a defect**, but the defining feature for zero-interference Retrieval-Augmented Generation (RAG). By geometrically constraining parametric encoding, Pollux behaves as a **stateless reasoning engine**: it grounds its output in externally provided context, structurally reducing interference from internally stored parametric associations. |
|
|
| --- |
|
|
| ## Limitations & Hardware Constraints |
|
|
| The **0.76 bits/param** backbone footprint counts packed 18-bit indices plus one FP16 σ_rms per row. The **reference PyTorch runtime materialises these into dense FP16 weight matrices** at forward time for `cuBLAS` compatibility (~1.59 GB FP16 for the Pollux-1920 backbone alone, vs. ~75.5 MB packed). This is intentional for research reproducibility; **native LUT gather–accumulate kernels** are required to achieve SRAM-bound latency on edge devices. |
| |
| --- |
| |
| ## Files Included |
| |
| | File | Description | |
| |---|---| |
| | **`pollux_1920_10k.plx`** | **Recommended for inference.** Pollux-1920 packed artifact — **75.5 MB backbone SRAM**, **265 MB total** on disk including INT8 embeddings and LM head. Empirically verified lossless. Load with `generate.py` or `evaluate.py`. | |
| | **`pollux_1920_10k.pt`** | **Training checkpoint** with continuous pre-weights in optimiser state; observable weights are dynamic Castor H24 projections. Use for inspecting pre-weights or reproducing the packing step. | |
| |
| *(Note: Neither file can be consumed by `llama.cpp` or standard GGUF loaders without the custom runtime).* |
| |
| --- |
| |
| ## Evaluation Results |
| |
| Evaluated with [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness). Pythia baseline: `EleutherAI/pythia-410m-deduped`. |
| |
| > *(Note: The Iso-Memory criterion isolates memory-bandwidth footprint under the targeted native LUT runtime. Under the current FP16 reference materialisation, FLOPs per token scale with backbone parameter count and are not matched between Pollux-1920 and Pythia baselines.)* |
| |
| | Task | Pollux-1920 @ 2.6B | Pythia-160M @ 4.2B (step 2k) | Pythia-410M @ 4.2B (step 2k) | Pythia-160M @ 300B (step 143k) | Pythia-410M @ 300B (step 143k) | |
| |---|---|---|---|---|---| |
| | **BLiMP mean (67 tasks)** | **73.0%** | 69.7% | 73.1% | 73.1% | 81.9% | |
| | **SciQ** | 60.7% | 58.7% | 57.2% | 72.3% | 82.4% | |
| | **HellaSwag** | 27.2% | 26.9% | 27.3% | 29.1% | 34.5% | |
| | **PIQA** | 59.8% | 58.4% | 58.2% | 61.9% | 67.2% | |
| | **Backbone SRAM** | **76 MB** | 162 MB | 577 MB | 162 MB | 577 MB | |
| | **Total on-disk footprint** | **265 MB** | 247 MB | 707 MB | 247 MB | 707 MB | |
| |
| --- |
| |
| ## Model Architecture Details |
| * **Architecture:** 18 layers · n_embd = 1920 · 80 heads · d_head = 24 |
| * **Total parameters:** 991M (796M quantized backbone) |
| * **Training corpus:** FineWeb-Edu 10B subset |
| * **Token budget:** 10,000 optimizer steps (~2.6 billion tokens), executed across three sequential resumed runs with fully preserved optimizer state due to hardware interruptions; loss trajectories are stitched by training step. |
| * **Optimizer:** Endogenous kinetic optimiser (`pollux_step`) with no architectural hyperparameters; γ = G24 ≈ 0.065771. Requires one corpus-specific environmental input: `H_floor` — the irreducible cross-entropy convergence floor of the training corpus, measured from a continuous FP16 baseline. |
| |
| ## Licensing & Citation |
| Released under the **PolyForm Noncommercial License 1.0.0** for academic research. |
| Commercial utilization requires a license (pending WIPO Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026). |
| |
| ```bibtex |
| @misc{lavicka2026pollux, |
| title = {0.76 Bits Is All You Need: Vector Ternary Logic via Native H24 Leech-Lattice Quantization in LLMs}, |
| author = {Lavicka, Alexander}, |
| year = {2026}, |
| note = {Preprint. WIPO Patent Application No. PCT/AT2026/060108 and Austrian Patent Application No. A65086/2026}, |
| url = {https://papers.ssrn.com/abstract=6973978} |
| |
| --- |
| |