tiny-parity-prover
Trained weights for a formally verified neural network that computes 8-bit parity. This repository contains the model artifacts; for proof development and Coq source code, see threshold-logic-verified.
Overview
This is a ternary threshold network that computes parity(x) = xβ β xβ β ... β xβ for 8-bit binary inputs. The network outputs 1 if the number of set bits is odd, 0 if even.
Key properties:
- 100% accuracy on all 256 possible inputs
- Correctness proven in Coq via constructive algebraic proof
- Weights constrained to {-1, 0, +1} (ternary quantization)
- Heaviside step activation (x β₯ 0 β 1, else 0)
- Suitable for neuromorphic hardware deployment
Model Variants
| Variant | Architecture | Parameters | Weights | Verification |
|---|---|---|---|---|
| Full | 8 β 32 β 16 β 1 | 833 | model.safetensors |
Coq-proven |
| Pruned | 8 β 11 β 3 β 1 | 139 | pruned/model.safetensors |
Coq-proven |
The pruned variant retains only the 14 neurons (11 in L1, 3 in L2) identified as critical through systematic ablation analysis. Both variants compute identical functions and are independently verified.
Quick Start
import torch
from safetensors.torch import load_file
# Load weights
weights = load_file('model.safetensors') # or 'pruned/model.safetensors'
# Manual forward pass (Heaviside activation)
def forward(x, weights):
x = x.float()
x = (x @ weights['layer1.weight'].T + weights['layer1.bias'] >= 0).float()
x = (x @ weights['layer2.weight'].T + weights['layer2.bias'] >= 0).float()
x = (x @ weights['output.weight'].T + weights['output.bias'] >= 0).float()
return x.squeeze(-1)
# Test
inputs = torch.tensor([[1, 0, 1, 1, 0, 0, 1, 0]], dtype=torch.float32)
output = forward(inputs, weights)
print(f"Parity of [1,0,1,1,0,0,1,0]: {int(output.item())}") # 0 (4 bits set, even)
Or use the provided model classes:
from model import ThresholdNetwork
model = ThresholdNetwork.from_safetensors('model.safetensors')
from pruned.model import PrunedThresholdNetwork
pruned = PrunedThresholdNetwork.from_safetensors('pruned/model.safetensors')
Weight Structure
Full Network (833 parameters)
| Tensor | Shape | Values | Description |
|---|---|---|---|
layer1.weight |
[32, 8] | {-1, 0, +1} | 256 ternary weights |
layer1.bias |
[32] | [-8, +8] | Bounded integers |
layer2.weight |
[16, 32] | {-1, 0, +1} | 512 ternary weights |
layer2.bias |
[16] | [-32, +32] | Bounded integers |
output.weight |
[1, 16] | {-1, 0, +1} | 16 ternary weights |
output.bias |
[1] | [-16, +16] | Bounded integer |
Pruned Network (139 parameters)
| Tensor | Shape | Values |
|---|---|---|
layer1.weight |
[11, 8] | {-1, 0, +1} |
layer1.bias |
[11] | Bounded integers |
layer2.weight |
[3, 11] | {-1, 0, +1} |
layer2.bias |
[3] | Bounded integers |
output.weight |
[1, 3] | {-1, 0, +1} |
output.bias |
[1] | Bounded integer |
Training
The network was trained using evolutionary search (genetic algorithm), not gradient descent.
Why not backpropagation? Parity is a canonical hard function for neural networks. The loss landscape is essentially flat because parity depends on a global property (bit count mod 2) rather than local features. Small weight perturbations produce negligible gradient signal until the network is already near a solution.
Evolutionary search parameters:
- Population size: 500
- Generations to solution: 10,542
- Selection: Top 20% elite
- Mutation rates: 0.05, 0.15, 0.30 (mixed)
- Fitness: Classification accuracy on all 256 inputs
The search finds a working configuration by directly optimizing discrete accuracy rather than relying on gradient information.
Formal Verification
Both network variants are proven correct in the Coq proof assistant. The verification evolved through 12 proof versions, culminating in a fully constructive algebraic proof.
Core theorem (proven for all 256 inputs):
Theorem network_correct : forall x0 x1 x2 x3 x4 x5 x6 x7,
network x0 x1 x2 x3 x4 x5 x6 x7 = parity [x0;x1;x2;x3;x4;x5;x6;x7].
Proof strategy (V9_FullyConstructive.v):
The proof decomposes the critical Layer-2 neuron's pre-activation into three components:
- A: Contribution from h-function neurons (depends on a specific linear form h(x))
- B: Contribution from Hamming-weight-dependent neurons
- C: Contribution from g-function neurons (where g(x) β‘ HW(x) mod 2)
Key algebraic insight: For any Β±1 weight vector, the dot product has the same parity as the Hamming weight. This is proven constructively:
Theorem g_parity : forall x0 x1 x2 x3 x4 x5 x6 x7,
Z.even (g_func x0 x1 x2 x3 x4 x5 x6 x7) =
Nat.even (hamming_weight [x0;x1;x2;x3;x4;x5;x6;x7]).
The network implements parity by:
- Two "always-on" neurons (L2-N0, L2-N2) with large biases
- One parity discriminator (L2-N1) that fires iff Hamming weight is even
- Output wiring that negates the discriminator
Parametric generalization (V10-V11):
The algebraic foundation extends to arbitrary input size n:
Theorem dot_parity_equals_hw_parity : forall ws xs,
length ws = length xs ->
Z.even (signed_dot ws xs) = Nat.even (hamming_weight xs).
A verified n-bit parity network construction is provided using n+3 neurons total.
Full proof development: threshold-logic-verified/coq
Test Results
Functional Validation
| Test | Result |
|---|---|
| Exhaustive (256 inputs) | 256/256 |
| Batch inference consistency | Pass |
| Determinism (1000 runs) | Pass |
| Throughput | 18,161 inf/sec (CPU) |
Analytical Properties
| Property | Finding |
|---|---|
| Weight sparsity | 32.9% zeros (258/784) |
| Critical L1 neurons | 11 of 32 |
| Critical L2 neurons | 3 of 16 |
| Noise sensitivity (Ο=0.05) | Accuracy drops to 51% |
| Minimum depth | 3 layers required |
Neuron ablation results:
| Neuron Removed | Accuracy Impact |
|---|---|
| L2-N8 | 256 β 128 (chance) |
| L1-N11 | 256 β 162 |
| Any redundant neuron | 256 β 256 (no change) |
Random pruning sensitivity:
| Weights Pruned | Accuracy |
|---|---|
| 0% | 256/256 |
| 10% | 138-191/256 |
| 20%+ | ~128/256 (chance) |
Noise Robustness
The network is extremely fragile to input noise due to the hard Heaviside threshold:
| Gaussian Noise Ο | Accuracy |
|---|---|
| 0.00 | 100.0% |
| 0.05 | 51.2% |
| 0.10 | 51.2% |
| 0.20 | 48.8% |
This is inherent to threshold networks, not a defect.
Bias Sensitivity
| Layer | Sensitive to Β±1 Perturbation |
|---|---|
| Layer 1 | 9 of 32 biases |
| Layer 2 | 1 of 16 biases |
| Output | 1 of 1 biases |
Practical Applications
The network has been validated on standard parity-based protocols:
| Application | Test Cases | Result |
|---|---|---|
| Memory ECC parity | 100 bytes | 100% |
| RAID-5 reconstruction | 50 stripes | 100% |
| UART frame validation | 100 frames | 100% |
| XOR checksum | 100 bytes | Pass |
| LFSR keystream | 32 bits | Deterministic |
| Barcode validation | 50 codes | 100% |
| Packet checksum | 20 packets | 100% |
| Hamming (7,4) syndrome | 20 codewords | 100% |
| PS/2 keyboard codes | 20 scancodes | 100% |
| DNA codon parity | 8 codons | 100% |
Neuromorphic Deployment
The primary use case is neuromorphic hardware where threshold networks are the native compute primitive and XOR gates cannot be directly instantiated:
| Platform | Status |
|---|---|
| Intel Loihi | Compatible (untested) |
| IBM TrueNorth | Compatible (untested) |
| BrainChip Akida | Compatible (untested) |
Resource comparison:
| Implementation | Gates/Neurons |
|---|---|
| Threshold network (pruned) | 14 neurons |
| Threshold network (full) | 49 neurons |
| XOR tree (conventional) | 7 gates |
The 70-140x overhead vs XOR gates is acceptable only when XOR is unavailable.
Pruned Network Details
The pruned network was extracted by identifying critical neurons through systematic ablation:
Retained neurons:
- Layer 1: indices [3, 7, 9, 11, 12, 16, 17, 20, 24, 27, 28]
- Layer 2: indices [8, 11, 14]
Interpretation:
The pruned network reveals the minimal structure for parity computation:
- L2-N0 (originally N8): Always fires (bias=30 dominates)
- L2-N1 (originally N11): Parity discriminator (fires iff HW even)
- L2-N2 (originally N14): Always fires (bias=20 dominates)
- Output: Computes
L2-N0 - L2-N1 + L2-N2 - 2 >= 0, which equalsNOT(L2-N1)
Since L2-N1 fires when Hamming weight is even, the output fires when Hamming weight is oddβexactly parity.
Limitations
- Fixed input size: 8 bits only (parametric construction in Coq extends to any n)
- Binary inputs: Expects {0, 1}, not continuous values
- No noise margin: Heaviside threshold at exactly 0
- Not differentiable: Cannot be fine-tuned with gradient descent
Files
tiny-parity-prover/
βββ model.safetensors # Full network weights (833 params)
βββ model.py # Full network inference code
βββ config.json # Full network metadata
βββ results.md # Comprehensive test results
βββ todo.md # Future work notes
βββ README.md # This file
βββ pruned/
βββ model.safetensors # Pruned network weights (139 params)
βββ model.py # Pruned network inference code
βββ config.json # Pruned network metadata
βββ README.md # Pruned variant description
Citation
@software{tiny_parity_prover_2025,
title={tiny-parity-prover: Formally Verified Neural Network for 8-bit Parity},
author={Norton, Charles},
url={https://huggingface.co/phanerozoic/tiny-parity-prover},
year={2025},
note={Weights and inference code. Proofs at github.com/CharlesCNorton/threshold-logic-verified}
}
Related
- Proof repository: threshold-logic-verified β Coq proofs, training code, full documentation
- Coq proof files: V0_Core.v through V11_ParametricNetwork.v
- Key proof: V9_FullyConstructive.v (complete algebraic proof without vm_compute)
License
MIT
- Downloads last month
- 36