--- license: mit tags: - pytorch - safetensors - formal-verification - coq - mod3 - modular-arithmetic - threshold-network - neuromorphic --- # tiny-mod3-prover Formally verified neural network that computes the MOD-3 function (Hamming weight mod 3) on 8-bit inputs. This repository contains the model artifacts; for proof development and Coq source code, see [mod3-verified](https://github.com/CharlesCNorton/mod3-verified). ## Overview This is a threshold network that computes `mod3(x) = HW(x) mod 3` for 8-bit binary inputs, where HW denotes Hamming weight (number of set bits). The network outputs 0, 1, or 2 corresponding to the three residue classes. **Key properties:** - 100% accuracy on all 256 possible inputs - Correctness proven in Coq via constructive algebraic proof - Weights constrained to integers (many ternary) - Heaviside step activation (x ≥ 0 → 1, else 0) - First formally verified threshold circuit for MOD-m where m > 2 ## Architecture | Layer | Neurons | Function | |-------|---------|----------| | Input | 8 | Binary input bits | | Hidden 1 | 9 | Thermometer encoding (HW ≥ k) | | Hidden 2 | 2 | MOD-3 detection | | Output | 3 | Classification (one-hot) | **Total: 14 neurons, 110 parameters** ## Quick Start ```python import torch from safetensors.torch import load_file # Load weights weights = load_file('model.safetensors') # Manual forward pass (Heaviside activation) def forward(x, weights): x = x.float() x = (x @ weights['layer1.weight'].T + weights['layer1.bias'] >= 0).float() x = (x @ weights['layer2.weight'].T + weights['layer2.bias'] >= 0).float() out = x @ weights['output.weight'].T + weights['output.bias'] return out.argmax(dim=-1) # Test inputs = torch.tensor([[1, 0, 1, 1, 0, 0, 1, 0]], dtype=torch.float32) output = forward(inputs, weights) print(f"MOD-3 of [1,0,1,1,0,0,1,0]: {output.item()}") # 1 (4 bits set, 4 mod 3 = 1) ``` ## Weight Structure | Tensor | Shape | Values | Description | |--------|-------|--------|-------------| | `layer1.weight` | [9, 8] | All 1s | Thermometer encoding | | `layer1.bias` | [9] | [0, -1, ..., -8] | Threshold at HW ≥ k | | `layer2.weight` | [2, 9] | [0,1,1,-2,1,1,-2,1,1] | MOD-3 detection | | `layer2.bias` | [2] | [-1, -2] | Class thresholds | | `output.weight` | [3, 2] | Various | Classification | | `output.bias` | [3] | [0, -1, -1] | Output thresholds | ## Algebraic Insight For parity (MOD-2), the key insight was that ±1 dot products preserve Hamming weight parity because (-1) ≡ 1 (mod 2). For MOD-3, the insight is different. Using weights `(1, 1, -2)` repeated on the thermometer encoding produces partial sums that cycle through `(0, 1, 2, 0, 1, 2, ...)`: ``` HW=0: cumsum = 0 → 0 mod 3 HW=1: cumsum = 1 → 1 mod 3 HW=2: cumsum = 2 → 2 mod 3 HW=3: cumsum = 0 → 0 mod 3 (reset: 1+1-2=0) HW=4: cumsum = 1 → 1 mod 3 HW=5: cumsum = 2 → 2 mod 3 HW=6: cumsum = 0 → 0 mod 3 (reset) HW=7: cumsum = 1 → 1 mod 3 HW=8: cumsum = 2 → 2 mod 3 ``` The pattern `1 + 1 + (-2) = 0` causes the cumulative sum to reset every 3 steps, tracking HW mod 3. This generalizes to MOD-m: use weights `(1, 1, ..., 1, 1-m)` with `m-1` ones before the `1-m` term. ## Formal Verification The network is proven correct in the Coq proof assistant with three independent proofs: **1. Exhaustive verification:** ```coq Theorem network_correct_exhaustive : verify_all = true. Proof. vm_compute. reflexivity. Qed. ``` **2. Constructive verification (case analysis):** ```coq Theorem network_correct_constructive : forall x0 x1 x2 x3 x4 x5 x6 x7, predict [x0; x1; x2; x3; x4; x5; x6; x7] = mod3 [x0; x1; x2; x3; x4; x5; x6; x7]. ``` **3. Algebraic verification:** ```coq Theorem cumsum_eq_mod3 : forall k, (k <= 8)%nat -> cumsum k = Z.of_nat (Nat.modulo k 3). Theorem network_algebraic_correct : forall h, (h <= 8)%nat -> classify (Z.geb (cumsum h - 1) 0) (Z.geb (cumsum h - 2) 0) = Nat.modulo h 3. ``` All proofs are axiom-free ("Closed under the global context"). ## MOD-3 Distribution For 8-bit inputs (256 total): | Class | Count | Hamming Weights | |-------|-------|-----------------| | 0 | 85 | 0, 3, 6 | | 1 | 86 | 1, 4, 7 | | 2 | 85 | 2, 5, 8 | ## Training The parametric construction was derived algebraically, not discovered through training. Evolutionary search was attempted (as with parity) but consistently plateaued at 247/256 (96.5%) accuracy across multiple seeds. We were unable to discover the (1,1,-2) weight pattern through training, so we used algebraic construction instead. ## Comparison to Parity | Property | Parity (MOD-2) | MOD-3 | |----------|----------------|-------| | Output classes | 2 | 3 | | Key weights | ±1 (any) | (1,1,-2) specific | | Training | Evolutionary (10K gen) | Algebraic construction | | Neurons | 14 (pruned) | 14 | | Parameters | 139 (pruned) | 110 | | Algebraic insight | (-1) ≡ 1 (mod 2) | 1+1-2 = 0 (reset) | ## Limitations - **Fixed input size**: 8 bits only (algebraic construction extends to any n) - **Binary inputs**: Expects {0, 1}, not continuous values - **No noise margin**: Heaviside threshold at exactly 0 - **Not differentiable**: Cannot be fine-tuned with gradient descent - **Training gap**: Our evolutionary search achieved only 96.5%; we used algebraic construction instead ## Files ``` mod3-verified/ ├── model.safetensors # Network weights (110 params) ├── model.py # Inference code ├── config.json # Model metadata └── README.md # This file ``` ## Citation ```bibtex @software{tiny_mod3_prover_2025, title={tiny-mod3-prover: Formally Verified Threshold Network for MOD-3}, author={Norton, Charles}, url={https://huggingface.co/phanerozoic/tiny-mod3-prover}, year={2025}, note={First verified threshold circuit for modular counting beyond parity} } ``` ## Related - **Proof repository**: [mod3-verified](https://github.com/CharlesCNorton/mod3-verified) — Coq proofs, training attempts, full documentation - **Parity network**: [tiny-parity-prover](https://huggingface.co/phanerozoic/tiny-parity-prover) — Verified MOD-2 (parity) network - **Parity proofs**: [threshold-logic-verified](https://github.com/CharlesCNorton/threshold-logic-verified) — Original parity verification project ## License MIT