tiny-parity-prover

Trained weights for a formally verified neural network that computes 8-bit parity. This repository contains the model artifacts; for proof development and Coq source code, see threshold-logic-verified.

Overview

This is a ternary threshold network that computes parity(x) = x₀ ⊕ x₁ ⊕ ... ⊕ x₇ for 8-bit binary inputs. The network outputs 1 if the number of set bits is odd, 0 if even.

Key properties:

100% accuracy on all 256 possible inputs
Correctness proven in Coq via constructive algebraic proof
Weights constrained to {-1, 0, +1} (ternary quantization)
Heaviside step activation (x ≥ 0 → 1, else 0)
Suitable for neuromorphic hardware deployment

Model Variants

Variant	Architecture	Parameters	Weights	Verification
Full	8 → 32 → 16 → 1	833	`model.safetensors`	Coq-proven
Pruned	8 → 11 → 3 → 1	139	`pruned/model.safetensors`	Coq-proven

The pruned variant retains only the 14 neurons (11 in L1, 3 in L2) identified as critical through systematic ablation analysis. Both variants compute identical functions and are independently verified.

Quick Start

import torch
from safetensors.torch import load_file

# Load weights
weights = load_file('model.safetensors')  # or 'pruned/model.safetensors'

# Manual forward pass (Heaviside activation)
def forward(x, weights):
    x = x.float()
    x = (x @ weights['layer1.weight'].T + weights['layer1.bias'] >= 0).float()
    x = (x @ weights['layer2.weight'].T + weights['layer2.bias'] >= 0).float()
    x = (x @ weights['output.weight'].T + weights['output.bias'] >= 0).float()
    return x.squeeze(-1)

# Test
inputs = torch.tensor([[1, 0, 1, 1, 0, 0, 1, 0]], dtype=torch.float32)
output = forward(inputs, weights)
print(f"Parity of [1,0,1,1,0,0,1,0]: {int(output.item())}")  # 0 (4 bits set, even)

Or use the provided model classes:

from model import ThresholdNetwork
model = ThresholdNetwork.from_safetensors('model.safetensors')

from pruned.model import PrunedThresholdNetwork
pruned = PrunedThresholdNetwork.from_safetensors('pruned/model.safetensors')

Weight Structure

Full Network (833 parameters)

Tensor	Shape	Values	Description
`layer1.weight`	[32, 8]	{-1, 0, +1}	256 ternary weights
`layer1.bias`	[32]	[-8, +8]	Bounded integers
`layer2.weight`	[16, 32]	{-1, 0, +1}	512 ternary weights
`layer2.bias`	[16]	[-32, +32]	Bounded integers
`output.weight`	[1, 16]	{-1, 0, +1}	16 ternary weights
`output.bias`	[1]	[-16, +16]	Bounded integer

Pruned Network (139 parameters)

Tensor	Shape	Values
`layer1.weight`	[11, 8]	{-1, 0, +1}
`layer1.bias`	[11]	Bounded integers
`layer2.weight`	[3, 11]	{-1, 0, +1}
`layer2.bias`	[3]	Bounded integers
`output.weight`	[1, 3]	{-1, 0, +1}
`output.bias`	[1]	Bounded integer

Training

The network was trained using evolutionary search (genetic algorithm), not gradient descent.

Why not backpropagation? Parity is a canonical hard function for neural networks. The loss landscape is essentially flat because parity depends on a global property (bit count mod 2) rather than local features. Small weight perturbations produce negligible gradient signal until the network is already near a solution.

Evolutionary search parameters:

Population size: 500
Generations to solution: 10,542
Selection: Top 20% elite
Mutation rates: 0.05, 0.15, 0.30 (mixed)
Fitness: Classification accuracy on all 256 inputs

The search finds a working configuration by directly optimizing discrete accuracy rather than relying on gradient information.

Formal Verification

Both network variants are proven correct in the Coq proof assistant. The verification evolved through 12 proof versions, culminating in a fully constructive algebraic proof.

Core theorem (proven for all 256 inputs):

Theorem network_correct : forall x0 x1 x2 x3 x4 x5 x6 x7,
  network x0 x1 x2 x3 x4 x5 x6 x7 = parity [x0;x1;x2;x3;x4;x5;x6;x7].

Proof strategy (V9_FullyConstructive.v):

The proof decomposes the critical Layer-2 neuron's pre-activation into three components:

A: Contribution from h-function neurons (depends on a specific linear form h(x))
B: Contribution from Hamming-weight-dependent neurons
C: Contribution from g-function neurons (where g(x) ≡ HW(x) mod 2)

Key algebraic insight: For any ±1 weight vector, the dot product has the same parity as the Hamming weight. This is proven constructively:

Theorem g_parity : forall x0 x1 x2 x3 x4 x5 x6 x7,
  Z.even (g_func x0 x1 x2 x3 x4 x5 x6 x7) =
  Nat.even (hamming_weight [x0;x1;x2;x3;x4;x5;x6;x7]).

The network implements parity by:

Two "always-on" neurons (L2-N0, L2-N2) with large biases
One parity discriminator (L2-N1) that fires iff Hamming weight is even
Output wiring that negates the discriminator

Parametric generalization (V10-V11):

The algebraic foundation extends to arbitrary input size n:

Theorem dot_parity_equals_hw_parity : forall ws xs,
  length ws = length xs ->
  Z.even (signed_dot ws xs) = Nat.even (hamming_weight xs).

A verified n-bit parity network construction is provided using n+3 neurons total.

Full proof development: threshold-logic-verified/coq

Test Results

Functional Validation

Test	Result
Exhaustive (256 inputs)	256/256
Batch inference consistency	Pass
Determinism (1000 runs)	Pass
Throughput	18,161 inf/sec (CPU)

Analytical Properties

Property	Finding
Weight sparsity	32.9% zeros (258/784)
Critical L1 neurons	11 of 32
Critical L2 neurons	3 of 16
Noise sensitivity (σ=0.05)	Accuracy drops to 51%
Minimum depth	3 layers required

Neuron ablation results:

Neuron Removed	Accuracy Impact
L2-N8	256 → 128 (chance)
L1-N11	256 → 162
Any redundant neuron	256 → 256 (no change)

Random pruning sensitivity:

Weights Pruned	Accuracy
0%	256/256
10%	138-191/256
20%+	~128/256 (chance)

Noise Robustness

The network is extremely fragile to input noise due to the hard Heaviside threshold:

Gaussian Noise σ	Accuracy
0.00	100.0%
0.05	51.2%
0.10	51.2%
0.20	48.8%

This is inherent to threshold networks, not a defect.

Bias Sensitivity

Layer	Sensitive to ±1 Perturbation
Layer 1	9 of 32 biases
Layer 2	1 of 16 biases
Output	1 of 1 biases

Practical Applications

The network has been validated on standard parity-based protocols:

Application	Test Cases	Result
Memory ECC parity	100 bytes	100%
RAID-5 reconstruction	50 stripes	100%
UART frame validation	100 frames	100%
XOR checksum	100 bytes	Pass
LFSR keystream	32 bits	Deterministic
Barcode validation	50 codes	100%
Packet checksum	20 packets	100%
Hamming (7,4) syndrome	20 codewords	100%
PS/2 keyboard codes	20 scancodes	100%
DNA codon parity	8 codons	100%

Neuromorphic Deployment

The primary use case is neuromorphic hardware where threshold networks are the native compute primitive and XOR gates cannot be directly instantiated:

Platform	Status
Intel Loihi	Compatible (untested)
IBM TrueNorth	Compatible (untested)
BrainChip Akida	Compatible (untested)

Resource comparison:

Implementation	Gates/Neurons
Threshold network (pruned)	14 neurons
Threshold network (full)	49 neurons
XOR tree (conventional)	7 gates

The 70-140x overhead vs XOR gates is acceptable only when XOR is unavailable.

Pruned Network Details

The pruned network was extracted by identifying critical neurons through systematic ablation:

Retained neurons:

Layer 1: indices [3, 7, 9, 11, 12, 16, 17, 20, 24, 27, 28]
Layer 2: indices [8, 11, 14]

Interpretation:

The pruned network reveals the minimal structure for parity computation:

L2-N0 (originally N8): Always fires (bias=30 dominates)
L2-N1 (originally N11): Parity discriminator (fires iff HW even)
L2-N2 (originally N14): Always fires (bias=20 dominates)
Output: Computes L2-N0 - L2-N1 + L2-N2 - 2 >= 0, which equals NOT(L2-N1)

Since L2-N1 fires when Hamming weight is even, the output fires when Hamming weight is odd—exactly parity.

Limitations

Fixed input size: 8 bits only (parametric construction in Coq extends to any n)
Binary inputs: Expects {0, 1}, not continuous values
No noise margin: Heaviside threshold at exactly 0
Not differentiable: Cannot be fine-tuned with gradient descent

Files

tiny-parity-prover/
├── model.safetensors      # Full network weights (833 params)
├── model.py               # Full network inference code
├── config.json            # Full network metadata
├── results.md             # Comprehensive test results
├── todo.md                # Future work notes
├── README.md              # This file
└── pruned/
    ├── model.safetensors  # Pruned network weights (139 params)
    ├── model.py           # Pruned network inference code
    ├── config.json        # Pruned network metadata
    └── README.md          # Pruned variant description

Citation

@software{tiny_parity_prover_2025,
  title={tiny-parity-prover: Formally Verified Neural Network for 8-bit Parity},
  author={Norton, Charles},
  url={https://huggingface.co/phanerozoic/tiny-parity-prover},
  year={2025},
  note={Weights and inference code. Proofs at github.com/CharlesCNorton/threshold-logic-verified}
}

Proof repository: threshold-logic-verified — Coq proofs, training code, full documentation
Coq proof files: V0_Core.v through V11_ParametricNetwork.v
Key proof: V9_FullyConstructive.v (complete algebraic proof without vm_compute)

License

MIT

Downloads last month: 36

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

phanerozoic
/

tiny-parity-prover