--- license: mit tags: - pytorch - safetensors - formal-verification - coq - parity - threshold-network - ternary-weights - binary-classification - neuromorphic --- # tiny-parity-prover Trained weights for a formally verified neural network that computes 8-bit parity. This repository contains the model artifacts; for proof development and Coq source code, see [threshold-logic-verified](https://github.com/CharlesCNorton/threshold-logic-verified). ## Overview This is a ternary threshold network that computes `parity(x) = x₀ ⊕ x₁ ⊕ ... ⊕ x₇` for 8-bit binary inputs. The network outputs 1 if the number of set bits is odd, 0 if even. **Key properties:** - 100% accuracy on all 256 possible inputs - Correctness proven in Coq via constructive algebraic proof - Weights constrained to {-1, 0, +1} (ternary quantization) - Heaviside step activation (x ≥ 0 → 1, else 0) - Suitable for neuromorphic hardware deployment ## Model Variants | Variant | Architecture | Parameters | Weights | Verification | |---------|--------------|------------|---------|--------------| | Full | 8 → 32 → 16 → 1 | 833 | `model.safetensors` | Coq-proven | | Pruned | 8 → 11 → 3 → 1 | 139 | `pruned/model.safetensors` | Coq-proven | The pruned variant retains only the 14 neurons (11 in L1, 3 in L2) identified as critical through systematic ablation analysis. Both variants compute identical functions and are independently verified. ## Quick Start ```python import torch from safetensors.torch import load_file # Load weights weights = load_file('model.safetensors') # or 'pruned/model.safetensors' # Manual forward pass (Heaviside activation) def forward(x, weights): x = x.float() x = (x @ weights['layer1.weight'].T + weights['layer1.bias'] >= 0).float() x = (x @ weights['layer2.weight'].T + weights['layer2.bias'] >= 0).float() x = (x @ weights['output.weight'].T + weights['output.bias'] >= 0).float() return x.squeeze(-1) # Test inputs = torch.tensor([[1, 0, 1, 1, 0, 0, 1, 0]], dtype=torch.float32) output = forward(inputs, weights) print(f"Parity of [1,0,1,1,0,0,1,0]: {int(output.item())}") # 0 (4 bits set, even) ``` Or use the provided model classes: ```python from model import ThresholdNetwork model = ThresholdNetwork.from_safetensors('model.safetensors') from pruned.model import PrunedThresholdNetwork pruned = PrunedThresholdNetwork.from_safetensors('pruned/model.safetensors') ``` ## Weight Structure ### Full Network (833 parameters) | Tensor | Shape | Values | Description | |--------|-------|--------|-------------| | `layer1.weight` | [32, 8] | {-1, 0, +1} | 256 ternary weights | | `layer1.bias` | [32] | [-8, +8] | Bounded integers | | `layer2.weight` | [16, 32] | {-1, 0, +1} | 512 ternary weights | | `layer2.bias` | [16] | [-32, +32] | Bounded integers | | `output.weight` | [1, 16] | {-1, 0, +1} | 16 ternary weights | | `output.bias` | [1] | [-16, +16] | Bounded integer | ### Pruned Network (139 parameters) | Tensor | Shape | Values | |--------|-------|--------| | `layer1.weight` | [11, 8] | {-1, 0, +1} | | `layer1.bias` | [11] | Bounded integers | | `layer2.weight` | [3, 11] | {-1, 0, +1} | | `layer2.bias` | [3] | Bounded integers | | `output.weight` | [1, 3] | {-1, 0, +1} | | `output.bias` | [1] | Bounded integer | ## Training The network was trained using evolutionary search (genetic algorithm), not gradient descent. **Why not backpropagation?** Parity is a canonical hard function for neural networks. The loss landscape is essentially flat because parity depends on a global property (bit count mod 2) rather than local features. Small weight perturbations produce negligible gradient signal until the network is already near a solution. **Evolutionary search parameters:** - Population size: 500 - Generations to solution: 10,542 - Selection: Top 20% elite - Mutation rates: 0.05, 0.15, 0.30 (mixed) - Fitness: Classification accuracy on all 256 inputs The search finds a working configuration by directly optimizing discrete accuracy rather than relying on gradient information. ## Formal Verification Both network variants are proven correct in the Coq proof assistant. The verification evolved through 12 proof versions, culminating in a fully constructive algebraic proof. **Core theorem (proven for all 256 inputs):** ```coq Theorem network_correct : forall x0 x1 x2 x3 x4 x5 x6 x7, network x0 x1 x2 x3 x4 x5 x6 x7 = parity [x0;x1;x2;x3;x4;x5;x6;x7]. ``` **Proof strategy (V9_FullyConstructive.v):** The proof decomposes the critical Layer-2 neuron's pre-activation into three components: - **A**: Contribution from h-function neurons (depends on a specific linear form h(x)) - **B**: Contribution from Hamming-weight-dependent neurons - **C**: Contribution from g-function neurons (where g(x) ≡ HW(x) mod 2) Key algebraic insight: For any ±1 weight vector, the dot product has the same parity as the Hamming weight. This is proven constructively: ```coq Theorem g_parity : forall x0 x1 x2 x3 x4 x5 x6 x7, Z.even (g_func x0 x1 x2 x3 x4 x5 x6 x7) = Nat.even (hamming_weight [x0;x1;x2;x3;x4;x5;x6;x7]). ``` The network implements parity by: 1. Two "always-on" neurons (L2-N0, L2-N2) with large biases 2. One parity discriminator (L2-N1) that fires iff Hamming weight is even 3. Output wiring that negates the discriminator **Parametric generalization (V10-V11):** The algebraic foundation extends to arbitrary input size n: ```coq Theorem dot_parity_equals_hw_parity : forall ws xs, length ws = length xs -> Z.even (signed_dot ws xs) = Nat.even (hamming_weight xs). ``` A verified n-bit parity network construction is provided using n+3 neurons total. Full proof development: [threshold-logic-verified/coq](https://github.com/CharlesCNorton/threshold-logic-verified/tree/main/coq) ## Test Results ### Functional Validation | Test | Result | |------|--------| | Exhaustive (256 inputs) | 256/256 | | Batch inference consistency | Pass | | Determinism (1000 runs) | Pass | | Throughput | 18,161 inf/sec (CPU) | ### Analytical Properties | Property | Finding | |----------|---------| | Weight sparsity | 32.9% zeros (258/784) | | Critical L1 neurons | 11 of 32 | | Critical L2 neurons | 3 of 16 | | Noise sensitivity (σ=0.05) | Accuracy drops to 51% | | Minimum depth | 3 layers required | **Neuron ablation results:** | Neuron Removed | Accuracy Impact | |----------------|-----------------| | L2-N8 | 256 → 128 (chance) | | L1-N11 | 256 → 162 | | Any redundant neuron | 256 → 256 (no change) | **Random pruning sensitivity:** | Weights Pruned | Accuracy | |----------------|----------| | 0% | 256/256 | | 10% | 138-191/256 | | 20%+ | ~128/256 (chance) | ### Noise Robustness The network is extremely fragile to input noise due to the hard Heaviside threshold: | Gaussian Noise σ | Accuracy | |------------------|----------| | 0.00 | 100.0% | | 0.05 | 51.2% | | 0.10 | 51.2% | | 0.20 | 48.8% | This is inherent to threshold networks, not a defect. ### Bias Sensitivity | Layer | Sensitive to ±1 Perturbation | |-------|------------------------------| | Layer 1 | 9 of 32 biases | | Layer 2 | 1 of 16 biases | | Output | 1 of 1 biases | ## Practical Applications The network has been validated on standard parity-based protocols: | Application | Test Cases | Result | |-------------|------------|--------| | Memory ECC parity | 100 bytes | 100% | | RAID-5 reconstruction | 50 stripes | 100% | | UART frame validation | 100 frames | 100% | | XOR checksum | 100 bytes | Pass | | LFSR keystream | 32 bits | Deterministic | | Barcode validation | 50 codes | 100% | | Packet checksum | 20 packets | 100% | | Hamming (7,4) syndrome | 20 codewords | 100% | | PS/2 keyboard codes | 20 scancodes | 100% | | DNA codon parity | 8 codons | 100% | ## Neuromorphic Deployment The primary use case is neuromorphic hardware where threshold networks are the native compute primitive and XOR gates cannot be directly instantiated: | Platform | Status | |----------|--------| | Intel Loihi | Compatible (untested) | | IBM TrueNorth | Compatible (untested) | | BrainChip Akida | Compatible (untested) | **Resource comparison:** | Implementation | Gates/Neurons | |----------------|---------------| | Threshold network (pruned) | 14 neurons | | Threshold network (full) | 49 neurons | | XOR tree (conventional) | 7 gates | The 70-140x overhead vs XOR gates is acceptable only when XOR is unavailable. ## Pruned Network Details The pruned network was extracted by identifying critical neurons through systematic ablation: **Retained neurons:** - Layer 1: indices [3, 7, 9, 11, 12, 16, 17, 20, 24, 27, 28] - Layer 2: indices [8, 11, 14] **Interpretation:** The pruned network reveals the minimal structure for parity computation: - L2-N0 (originally N8): Always fires (bias=30 dominates) - L2-N1 (originally N11): Parity discriminator (fires iff HW even) - L2-N2 (originally N14): Always fires (bias=20 dominates) - Output: Computes `L2-N0 - L2-N1 + L2-N2 - 2 >= 0`, which equals `NOT(L2-N1)` Since L2-N1 fires when Hamming weight is even, the output fires when Hamming weight is odd—exactly parity. ## Limitations - **Fixed input size**: 8 bits only (parametric construction in Coq extends to any n) - **Binary inputs**: Expects {0, 1}, not continuous values - **No noise margin**: Heaviside threshold at exactly 0 - **Not differentiable**: Cannot be fine-tuned with gradient descent ## Files ``` tiny-parity-prover/ ├── model.safetensors # Full network weights (833 params) ├── model.py # Full network inference code ├── config.json # Full network metadata ├── results.md # Comprehensive test results ├── todo.md # Future work notes ├── README.md # This file └── pruned/ ├── model.safetensors # Pruned network weights (139 params) ├── model.py # Pruned network inference code ├── config.json # Pruned network metadata └── README.md # Pruned variant description ``` ## Citation ```bibtex @software{tiny_parity_prover_2025, title={tiny-parity-prover: Formally Verified Neural Network for 8-bit Parity}, author={Norton, Charles}, url={https://huggingface.co/phanerozoic/tiny-parity-prover}, year={2025}, note={Weights and inference code. Proofs at github.com/CharlesCNorton/threshold-logic-verified} } ``` ## Related - **Proof repository**: [threshold-logic-verified](https://github.com/CharlesCNorton/threshold-logic-verified) — Coq proofs, training code, full documentation - **Coq proof files**: V0_Core.v through V11_ParametricNetwork.v - **Key proof**: V9_FullyConstructive.v (complete algebraic proof without vm_compute) ## License MIT