Upload README.md with huggingface_hub

914753d verified 3 days ago

11.1 kB

	---
	license: mit
	tags:
	- pytorch
	- safetensors
	- formal-verification
	- coq
	- parity
	- threshold-network
	- ternary-weights
	- binary-classification
	- neuromorphic
	---

	# tiny-parity-prover

	Trained weights for a formally verified neural network that computes 8-bit parity. This repository contains the model artifacts; for proof development and Coq source code, see [threshold-logic-verified](https://github.com/CharlesCNorton/threshold-logic-verified).

	## Overview

	This is a ternary threshold network that computes `parity(x) = x₀ ⊕ x₁ ⊕ ... ⊕ x₇` for 8-bit binary inputs. The network outputs 1 if the number of set bits is odd, 0 if even.

	Key properties:
	- 100% accuracy on all 256 possible inputs
	- Correctness proven in Coq via constructive algebraic proof
	- Weights constrained to {-1, 0, +1} (ternary quantization)
	- Heaviside step activation (x ≥ 0 → 1, else 0)
	- Suitable for neuromorphic hardware deployment

	## Model Variants

	\| Variant \| Architecture \| Parameters \| Weights \| Verification \|
	\|---------\|--------------\|------------\|---------\|--------------\|
	\| Full \| 8 → 32 → 16 → 1 \| 833 \| `model.safetensors` \| Coq-proven \|
	\| Pruned \| 8 → 11 → 3 → 1 \| 139 \| `pruned/model.safetensors` \| Coq-proven \|

	The pruned variant retains only the 14 neurons (11 in L1, 3 in L2) identified as critical through systematic ablation analysis. Both variants compute identical functions and are independently verified.

	## Quick Start

	```python
	import torch
	from safetensors.torch import load_file

	# Load weights
	weights = load_file('model.safetensors') # or 'pruned/model.safetensors'

	# Manual forward pass (Heaviside activation)
	def forward(x, weights):
	x = x.float()
	x = (x @ weights['layer1.weight'].T + weights['layer1.bias'] >= 0).float()
	x = (x @ weights['layer2.weight'].T + weights['layer2.bias'] >= 0).float()
	x = (x @ weights['output.weight'].T + weights['output.bias'] >= 0).float()
	return x.squeeze(-1)

	# Test
	inputs = torch.tensor([[1, 0, 1, 1, 0, 0, 1, 0]], dtype=torch.float32)
	output = forward(inputs, weights)
	print(f"Parity of [1,0,1,1,0,0,1,0]: {int(output.item())}") # 0 (4 bits set, even)
	```

	Or use the provided model classes:

	```python
	from model import ThresholdNetwork
	model = ThresholdNetwork.from_safetensors('model.safetensors')

	from pruned.model import PrunedThresholdNetwork
	pruned = PrunedThresholdNetwork.from_safetensors('pruned/model.safetensors')
	```

	## Weight Structure

	### Full Network (833 parameters)

	\| Tensor \| Shape \| Values \| Description \|
	\|--------\|-------\|--------\|-------------\|
	\| `layer1.weight` \| [32, 8] \| {-1, 0, +1} \| 256 ternary weights \|
	\| `layer1.bias` \| [32] \| [-8, +8] \| Bounded integers \|
	\| `layer2.weight` \| [16, 32] \| {-1, 0, +1} \| 512 ternary weights \|
	\| `layer2.bias` \| [16] \| [-32, +32] \| Bounded integers \|
	\| `output.weight` \| [1, 16] \| {-1, 0, +1} \| 16 ternary weights \|
	\| `output.bias` \| [1] \| [-16, +16] \| Bounded integer \|

	### Pruned Network (139 parameters)

	\| Tensor \| Shape \| Values \|
	\|--------\|-------\|--------\|
	\| `layer1.weight` \| [11, 8] \| {-1, 0, +1} \|
	\| `layer1.bias` \| [11] \| Bounded integers \|
	\| `layer2.weight` \| [3, 11] \| {-1, 0, +1} \|
	\| `layer2.bias` \| [3] \| Bounded integers \|
	\| `output.weight` \| [1, 3] \| {-1, 0, +1} \|
	\| `output.bias` \| [1] \| Bounded integer \|

	## Training

	The network was trained using evolutionary search (genetic algorithm), not gradient descent.

	Why not backpropagation? Parity is a canonical hard function for neural networks. The loss landscape is essentially flat because parity depends on a global property (bit count mod 2) rather than local features. Small weight perturbations produce negligible gradient signal until the network is already near a solution.

	Evolutionary search parameters:
	- Population size: 500
	- Generations to solution: 10,542
	- Selection: Top 20% elite
	- Mutation rates: 0.05, 0.15, 0.30 (mixed)
	- Fitness: Classification accuracy on all 256 inputs

	The search finds a working configuration by directly optimizing discrete accuracy rather than relying on gradient information.

	## Formal Verification

	Both network variants are proven correct in the Coq proof assistant. The verification evolved through 12 proof versions, culminating in a fully constructive algebraic proof.

	Core theorem (proven for all 256 inputs):
	```coq
	Theorem network_correct : forall x0 x1 x2 x3 x4 x5 x6 x7,
	network x0 x1 x2 x3 x4 x5 x6 x7 = parity [x0;x1;x2;x3;x4;x5;x6;x7].
	```

	Proof strategy (V9_FullyConstructive.v):

	The proof decomposes the critical Layer-2 neuron's pre-activation into three components:
	- A: Contribution from h-function neurons (depends on a specific linear form h(x))
	- B: Contribution from Hamming-weight-dependent neurons
	- C: Contribution from g-function neurons (where g(x) ≡ HW(x) mod 2)

	Key algebraic insight: For any ±1 weight vector, the dot product has the same parity as the Hamming weight. This is proven constructively:

	```coq
	Theorem g_parity : forall x0 x1 x2 x3 x4 x5 x6 x7,
	Z.even (g_func x0 x1 x2 x3 x4 x5 x6 x7) =
	Nat.even (hamming_weight [x0;x1;x2;x3;x4;x5;x6;x7]).
	```

	The network implements parity by:
	1. Two "always-on" neurons (L2-N0, L2-N2) with large biases
	2. One parity discriminator (L2-N1) that fires iff Hamming weight is even
	3. Output wiring that negates the discriminator

	Parametric generalization (V10-V11):

	The algebraic foundation extends to arbitrary input size n:

	```coq
	Theorem dot_parity_equals_hw_parity : forall ws xs,
	length ws = length xs ->
	Z.even (signed_dot ws xs) = Nat.even (hamming_weight xs).
	```

	A verified n-bit parity network construction is provided using n+3 neurons total.

	Full proof development: [threshold-logic-verified/coq](https://github.com/CharlesCNorton/threshold-logic-verified/tree/main/coq)

	## Test Results

	### Functional Validation

	\| Test \| Result \|
	\|------\|--------\|
	\| Exhaustive (256 inputs) \| 256/256 \|
	\| Batch inference consistency \| Pass \|
	\| Determinism (1000 runs) \| Pass \|
	\| Throughput \| 18,161 inf/sec (CPU) \|

	### Analytical Properties

	\| Property \| Finding \|
	\|----------\|---------\|
	\| Weight sparsity \| 32.9% zeros (258/784) \|
	\| Critical L1 neurons \| 11 of 32 \|
	\| Critical L2 neurons \| 3 of 16 \|
	\| Noise sensitivity (σ=0.05) \| Accuracy drops to 51% \|
	\| Minimum depth \| 3 layers required \|

	Neuron ablation results:

	\| Neuron Removed \| Accuracy Impact \|
	\|----------------\|-----------------\|
	\| L2-N8 \| 256 → 128 (chance) \|
	\| L1-N11 \| 256 → 162 \|
	\| Any redundant neuron \| 256 → 256 (no change) \|

	Random pruning sensitivity:

	\| Weights Pruned \| Accuracy \|
	\|----------------\|----------\|
	\| 0% \| 256/256 \|
	\| 10% \| 138-191/256 \|
	\| 20%+ \| ~128/256 (chance) \|

	### Noise Robustness

	The network is extremely fragile to input noise due to the hard Heaviside threshold:

	\| Gaussian Noise σ \| Accuracy \|
	\|------------------\|----------\|
	\| 0.00 \| 100.0% \|
	\| 0.05 \| 51.2% \|
	\| 0.10 \| 51.2% \|
	\| 0.20 \| 48.8% \|

	This is inherent to threshold networks, not a defect.

	### Bias Sensitivity

	\| Layer \| Sensitive to ±1 Perturbation \|
	\|-------\|------------------------------\|
	\| Layer 1 \| 9 of 32 biases \|
	\| Layer 2 \| 1 of 16 biases \|
	\| Output \| 1 of 1 biases \|

	## Practical Applications

	The network has been validated on standard parity-based protocols:

	\| Application \| Test Cases \| Result \|
	\|-------------\|------------\|--------\|
	\| Memory ECC parity \| 100 bytes \| 100% \|
	\| RAID-5 reconstruction \| 50 stripes \| 100% \|
	\| UART frame validation \| 100 frames \| 100% \|
	\| XOR checksum \| 100 bytes \| Pass \|
	\| LFSR keystream \| 32 bits \| Deterministic \|
	\| Barcode validation \| 50 codes \| 100% \|
	\| Packet checksum \| 20 packets \| 100% \|
	\| Hamming (7,4) syndrome \| 20 codewords \| 100% \|
	\| PS/2 keyboard codes \| 20 scancodes \| 100% \|
	\| DNA codon parity \| 8 codons \| 100% \|

	## Neuromorphic Deployment

	The primary use case is neuromorphic hardware where threshold networks are the native compute primitive and XOR gates cannot be directly instantiated:

	\| Platform \| Status \|
	\|----------\|--------\|
	\| Intel Loihi \| Compatible (untested) \|
	\| IBM TrueNorth \| Compatible (untested) \|
	\| BrainChip Akida \| Compatible (untested) \|

	Resource comparison:

	\| Implementation \| Gates/Neurons \|
	\|----------------\|---------------\|
	\| Threshold network (pruned) \| 14 neurons \|
	\| Threshold network (full) \| 49 neurons \|
	\| XOR tree (conventional) \| 7 gates \|

	The 70-140x overhead vs XOR gates is acceptable only when XOR is unavailable.

	## Pruned Network Details

	The pruned network was extracted by identifying critical neurons through systematic ablation:

	Retained neurons:
	- Layer 1: indices [3, 7, 9, 11, 12, 16, 17, 20, 24, 27, 28]
	- Layer 2: indices [8, 11, 14]

	Interpretation:

	The pruned network reveals the minimal structure for parity computation:
	- L2-N0 (originally N8): Always fires (bias=30 dominates)
	- L2-N1 (originally N11): Parity discriminator (fires iff HW even)
	- L2-N2 (originally N14): Always fires (bias=20 dominates)
	- Output: Computes `L2-N0 - L2-N1 + L2-N2 - 2 >= 0`, which equals `NOT(L2-N1)`

	Since L2-N1 fires when Hamming weight is even, the output fires when Hamming weight is odd—exactly parity.

	## Limitations

	- Fixed input size: 8 bits only (parametric construction in Coq extends to any n)
	- Binary inputs: Expects {0, 1}, not continuous values
	- No noise margin: Heaviside threshold at exactly 0
	- Not differentiable: Cannot be fine-tuned with gradient descent

	## Files

	```
	tiny-parity-prover/
	├── model.safetensors # Full network weights (833 params)
	├── model.py # Full network inference code
	├── config.json # Full network metadata
	├── results.md # Comprehensive test results
	├── todo.md # Future work notes
	├── README.md # This file
	└── pruned/
	├── model.safetensors # Pruned network weights (139 params)
	├── model.py # Pruned network inference code
	├── config.json # Pruned network metadata
	└── README.md # Pruned variant description
	```

	## Citation

	```bibtex
	@software{tiny_parity_prover_2025,
	title={tiny-parity-prover: Formally Verified Neural Network for 8-bit Parity},
	author={Norton, Charles},
	url={https://huggingface.co/phanerozoic/tiny-parity-prover},
	year={2025},
	note={Weights and inference code. Proofs at github.com/CharlesCNorton/threshold-logic-verified}
	}
	```

	## Related

	- Proof repository: [threshold-logic-verified](https://github.com/CharlesCNorton/threshold-logic-verified) — Coq proofs, training code, full documentation
	- Coq proof files: V0_Core.v through V11_ParametricNetwork.v
	- Key proof: V9_FullyConstructive.v (complete algebraic proof without vm_compute)

	## License

	MIT