|
|
---
|
|
|
license: mit
|
|
|
tags:
|
|
|
- pytorch
|
|
|
- safetensors
|
|
|
- formal-verification
|
|
|
- coq
|
|
|
- parity
|
|
|
- threshold-network
|
|
|
- ternary-weights
|
|
|
- binary-classification
|
|
|
- neuromorphic
|
|
|
---
|
|
|
|
|
|
# tiny-parity-prover
|
|
|
|
|
|
Trained weights for a formally verified neural network that computes 8-bit parity. This repository contains the model artifacts; for proof development and Coq source code, see [threshold-logic-verified](https://github.com/CharlesCNorton/threshold-logic-verified).
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
This is a ternary threshold network that computes `parity(x) = xβ β xβ β ... β xβ` for 8-bit binary inputs. The network outputs 1 if the number of set bits is odd, 0 if even.
|
|
|
|
|
|
**Key properties:**
|
|
|
- 100% accuracy on all 256 possible inputs
|
|
|
- Correctness proven in Coq via constructive algebraic proof
|
|
|
- Weights constrained to {-1, 0, +1} (ternary quantization)
|
|
|
- Heaviside step activation (x β₯ 0 β 1, else 0)
|
|
|
- Suitable for neuromorphic hardware deployment
|
|
|
|
|
|
## Model Variants
|
|
|
|
|
|
| Variant | Architecture | Parameters | Weights | Verification |
|
|
|
|---------|--------------|------------|---------|--------------|
|
|
|
| Full | 8 β 32 β 16 β 1 | 833 | `model.safetensors` | Coq-proven |
|
|
|
| Pruned | 8 β 11 β 3 β 1 | 139 | `pruned/model.safetensors` | Coq-proven |
|
|
|
|
|
|
The pruned variant retains only the 14 neurons (11 in L1, 3 in L2) identified as critical through systematic ablation analysis. Both variants compute identical functions and are independently verified.
|
|
|
|
|
|
## Quick Start
|
|
|
|
|
|
```python
|
|
|
import torch
|
|
|
from safetensors.torch import load_file
|
|
|
|
|
|
# Load weights
|
|
|
weights = load_file('model.safetensors') # or 'pruned/model.safetensors'
|
|
|
|
|
|
# Manual forward pass (Heaviside activation)
|
|
|
def forward(x, weights):
|
|
|
x = x.float()
|
|
|
x = (x @ weights['layer1.weight'].T + weights['layer1.bias'] >= 0).float()
|
|
|
x = (x @ weights['layer2.weight'].T + weights['layer2.bias'] >= 0).float()
|
|
|
x = (x @ weights['output.weight'].T + weights['output.bias'] >= 0).float()
|
|
|
return x.squeeze(-1)
|
|
|
|
|
|
# Test
|
|
|
inputs = torch.tensor([[1, 0, 1, 1, 0, 0, 1, 0]], dtype=torch.float32)
|
|
|
output = forward(inputs, weights)
|
|
|
print(f"Parity of [1,0,1,1,0,0,1,0]: {int(output.item())}") # 0 (4 bits set, even)
|
|
|
```
|
|
|
|
|
|
Or use the provided model classes:
|
|
|
|
|
|
```python
|
|
|
from model import ThresholdNetwork
|
|
|
model = ThresholdNetwork.from_safetensors('model.safetensors')
|
|
|
|
|
|
from pruned.model import PrunedThresholdNetwork
|
|
|
pruned = PrunedThresholdNetwork.from_safetensors('pruned/model.safetensors')
|
|
|
```
|
|
|
|
|
|
## Weight Structure
|
|
|
|
|
|
### Full Network (833 parameters)
|
|
|
|
|
|
| Tensor | Shape | Values | Description |
|
|
|
|--------|-------|--------|-------------|
|
|
|
| `layer1.weight` | [32, 8] | {-1, 0, +1} | 256 ternary weights |
|
|
|
| `layer1.bias` | [32] | [-8, +8] | Bounded integers |
|
|
|
| `layer2.weight` | [16, 32] | {-1, 0, +1} | 512 ternary weights |
|
|
|
| `layer2.bias` | [16] | [-32, +32] | Bounded integers |
|
|
|
| `output.weight` | [1, 16] | {-1, 0, +1} | 16 ternary weights |
|
|
|
| `output.bias` | [1] | [-16, +16] | Bounded integer |
|
|
|
|
|
|
### Pruned Network (139 parameters)
|
|
|
|
|
|
| Tensor | Shape | Values |
|
|
|
|--------|-------|--------|
|
|
|
| `layer1.weight` | [11, 8] | {-1, 0, +1} |
|
|
|
| `layer1.bias` | [11] | Bounded integers |
|
|
|
| `layer2.weight` | [3, 11] | {-1, 0, +1} |
|
|
|
| `layer2.bias` | [3] | Bounded integers |
|
|
|
| `output.weight` | [1, 3] | {-1, 0, +1} |
|
|
|
| `output.bias` | [1] | Bounded integer |
|
|
|
|
|
|
## Training
|
|
|
|
|
|
The network was trained using evolutionary search (genetic algorithm), not gradient descent.
|
|
|
|
|
|
**Why not backpropagation?** Parity is a canonical hard function for neural networks. The loss landscape is essentially flat because parity depends on a global property (bit count mod 2) rather than local features. Small weight perturbations produce negligible gradient signal until the network is already near a solution.
|
|
|
|
|
|
**Evolutionary search parameters:**
|
|
|
- Population size: 500
|
|
|
- Generations to solution: 10,542
|
|
|
- Selection: Top 20% elite
|
|
|
- Mutation rates: 0.05, 0.15, 0.30 (mixed)
|
|
|
- Fitness: Classification accuracy on all 256 inputs
|
|
|
|
|
|
The search finds a working configuration by directly optimizing discrete accuracy rather than relying on gradient information.
|
|
|
|
|
|
## Formal Verification
|
|
|
|
|
|
Both network variants are proven correct in the Coq proof assistant. The verification evolved through 12 proof versions, culminating in a fully constructive algebraic proof.
|
|
|
|
|
|
**Core theorem (proven for all 256 inputs):**
|
|
|
```coq
|
|
|
Theorem network_correct : forall x0 x1 x2 x3 x4 x5 x6 x7,
|
|
|
network x0 x1 x2 x3 x4 x5 x6 x7 = parity [x0;x1;x2;x3;x4;x5;x6;x7].
|
|
|
```
|
|
|
|
|
|
**Proof strategy (V9_FullyConstructive.v):**
|
|
|
|
|
|
The proof decomposes the critical Layer-2 neuron's pre-activation into three components:
|
|
|
- **A**: Contribution from h-function neurons (depends on a specific linear form h(x))
|
|
|
- **B**: Contribution from Hamming-weight-dependent neurons
|
|
|
- **C**: Contribution from g-function neurons (where g(x) β‘ HW(x) mod 2)
|
|
|
|
|
|
Key algebraic insight: For any Β±1 weight vector, the dot product has the same parity as the Hamming weight. This is proven constructively:
|
|
|
|
|
|
```coq
|
|
|
Theorem g_parity : forall x0 x1 x2 x3 x4 x5 x6 x7,
|
|
|
Z.even (g_func x0 x1 x2 x3 x4 x5 x6 x7) =
|
|
|
Nat.even (hamming_weight [x0;x1;x2;x3;x4;x5;x6;x7]).
|
|
|
```
|
|
|
|
|
|
The network implements parity by:
|
|
|
1. Two "always-on" neurons (L2-N0, L2-N2) with large biases
|
|
|
2. One parity discriminator (L2-N1) that fires iff Hamming weight is even
|
|
|
3. Output wiring that negates the discriminator
|
|
|
|
|
|
**Parametric generalization (V10-V11):**
|
|
|
|
|
|
The algebraic foundation extends to arbitrary input size n:
|
|
|
|
|
|
```coq
|
|
|
Theorem dot_parity_equals_hw_parity : forall ws xs,
|
|
|
length ws = length xs ->
|
|
|
Z.even (signed_dot ws xs) = Nat.even (hamming_weight xs).
|
|
|
```
|
|
|
|
|
|
A verified n-bit parity network construction is provided using n+3 neurons total.
|
|
|
|
|
|
Full proof development: [threshold-logic-verified/coq](https://github.com/CharlesCNorton/threshold-logic-verified/tree/main/coq)
|
|
|
|
|
|
## Test Results
|
|
|
|
|
|
### Functional Validation
|
|
|
|
|
|
| Test | Result |
|
|
|
|------|--------|
|
|
|
| Exhaustive (256 inputs) | 256/256 |
|
|
|
| Batch inference consistency | Pass |
|
|
|
| Determinism (1000 runs) | Pass |
|
|
|
| Throughput | 18,161 inf/sec (CPU) |
|
|
|
|
|
|
### Analytical Properties
|
|
|
|
|
|
| Property | Finding |
|
|
|
|----------|---------|
|
|
|
| Weight sparsity | 32.9% zeros (258/784) |
|
|
|
| Critical L1 neurons | 11 of 32 |
|
|
|
| Critical L2 neurons | 3 of 16 |
|
|
|
| Noise sensitivity (Ο=0.05) | Accuracy drops to 51% |
|
|
|
| Minimum depth | 3 layers required |
|
|
|
|
|
|
**Neuron ablation results:**
|
|
|
|
|
|
| Neuron Removed | Accuracy Impact |
|
|
|
|----------------|-----------------|
|
|
|
| L2-N8 | 256 β 128 (chance) |
|
|
|
| L1-N11 | 256 β 162 |
|
|
|
| Any redundant neuron | 256 β 256 (no change) |
|
|
|
|
|
|
**Random pruning sensitivity:**
|
|
|
|
|
|
| Weights Pruned | Accuracy |
|
|
|
|----------------|----------|
|
|
|
| 0% | 256/256 |
|
|
|
| 10% | 138-191/256 |
|
|
|
| 20%+ | ~128/256 (chance) |
|
|
|
|
|
|
### Noise Robustness
|
|
|
|
|
|
The network is extremely fragile to input noise due to the hard Heaviside threshold:
|
|
|
|
|
|
| Gaussian Noise Ο | Accuracy |
|
|
|
|------------------|----------|
|
|
|
| 0.00 | 100.0% |
|
|
|
| 0.05 | 51.2% |
|
|
|
| 0.10 | 51.2% |
|
|
|
| 0.20 | 48.8% |
|
|
|
|
|
|
This is inherent to threshold networks, not a defect.
|
|
|
|
|
|
### Bias Sensitivity
|
|
|
|
|
|
| Layer | Sensitive to Β±1 Perturbation |
|
|
|
|-------|------------------------------|
|
|
|
| Layer 1 | 9 of 32 biases |
|
|
|
| Layer 2 | 1 of 16 biases |
|
|
|
| Output | 1 of 1 biases |
|
|
|
|
|
|
## Practical Applications
|
|
|
|
|
|
The network has been validated on standard parity-based protocols:
|
|
|
|
|
|
| Application | Test Cases | Result |
|
|
|
|-------------|------------|--------|
|
|
|
| Memory ECC parity | 100 bytes | 100% |
|
|
|
| RAID-5 reconstruction | 50 stripes | 100% |
|
|
|
| UART frame validation | 100 frames | 100% |
|
|
|
| XOR checksum | 100 bytes | Pass |
|
|
|
| LFSR keystream | 32 bits | Deterministic |
|
|
|
| Barcode validation | 50 codes | 100% |
|
|
|
| Packet checksum | 20 packets | 100% |
|
|
|
| Hamming (7,4) syndrome | 20 codewords | 100% |
|
|
|
| PS/2 keyboard codes | 20 scancodes | 100% |
|
|
|
| DNA codon parity | 8 codons | 100% |
|
|
|
|
|
|
## Neuromorphic Deployment
|
|
|
|
|
|
The primary use case is neuromorphic hardware where threshold networks are the native compute primitive and XOR gates cannot be directly instantiated:
|
|
|
|
|
|
| Platform | Status |
|
|
|
|----------|--------|
|
|
|
| Intel Loihi | Compatible (untested) |
|
|
|
| IBM TrueNorth | Compatible (untested) |
|
|
|
| BrainChip Akida | Compatible (untested) |
|
|
|
|
|
|
**Resource comparison:**
|
|
|
|
|
|
| Implementation | Gates/Neurons |
|
|
|
|----------------|---------------|
|
|
|
| Threshold network (pruned) | 14 neurons |
|
|
|
| Threshold network (full) | 49 neurons |
|
|
|
| XOR tree (conventional) | 7 gates |
|
|
|
|
|
|
The 70-140x overhead vs XOR gates is acceptable only when XOR is unavailable.
|
|
|
|
|
|
## Pruned Network Details
|
|
|
|
|
|
The pruned network was extracted by identifying critical neurons through systematic ablation:
|
|
|
|
|
|
**Retained neurons:**
|
|
|
- Layer 1: indices [3, 7, 9, 11, 12, 16, 17, 20, 24, 27, 28]
|
|
|
- Layer 2: indices [8, 11, 14]
|
|
|
|
|
|
**Interpretation:**
|
|
|
|
|
|
The pruned network reveals the minimal structure for parity computation:
|
|
|
- L2-N0 (originally N8): Always fires (bias=30 dominates)
|
|
|
- L2-N1 (originally N11): Parity discriminator (fires iff HW even)
|
|
|
- L2-N2 (originally N14): Always fires (bias=20 dominates)
|
|
|
- Output: Computes `L2-N0 - L2-N1 + L2-N2 - 2 >= 0`, which equals `NOT(L2-N1)`
|
|
|
|
|
|
Since L2-N1 fires when Hamming weight is even, the output fires when Hamming weight is oddβexactly parity.
|
|
|
|
|
|
## Limitations
|
|
|
|
|
|
- **Fixed input size**: 8 bits only (parametric construction in Coq extends to any n)
|
|
|
- **Binary inputs**: Expects {0, 1}, not continuous values
|
|
|
- **No noise margin**: Heaviside threshold at exactly 0
|
|
|
- **Not differentiable**: Cannot be fine-tuned with gradient descent
|
|
|
|
|
|
## Files
|
|
|
|
|
|
```
|
|
|
tiny-parity-prover/
|
|
|
βββ model.safetensors # Full network weights (833 params)
|
|
|
βββ model.py # Full network inference code
|
|
|
βββ config.json # Full network metadata
|
|
|
βββ results.md # Comprehensive test results
|
|
|
βββ todo.md # Future work notes
|
|
|
βββ README.md # This file
|
|
|
βββ pruned/
|
|
|
βββ model.safetensors # Pruned network weights (139 params)
|
|
|
βββ model.py # Pruned network inference code
|
|
|
βββ config.json # Pruned network metadata
|
|
|
βββ README.md # Pruned variant description
|
|
|
```
|
|
|
|
|
|
## Citation
|
|
|
|
|
|
```bibtex
|
|
|
@software{tiny_parity_prover_2025,
|
|
|
title={tiny-parity-prover: Formally Verified Neural Network for 8-bit Parity},
|
|
|
author={Norton, Charles},
|
|
|
url={https://huggingface.co/phanerozoic/tiny-parity-prover},
|
|
|
year={2025},
|
|
|
note={Weights and inference code. Proofs at github.com/CharlesCNorton/threshold-logic-verified}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## Related
|
|
|
|
|
|
- **Proof repository**: [threshold-logic-verified](https://github.com/CharlesCNorton/threshold-logic-verified) β Coq proofs, training code, full documentation
|
|
|
- **Coq proof files**: V0_Core.v through V11_ParametricNetwork.v
|
|
|
- **Key proof**: V9_FullyConstructive.v (complete algebraic proof without vm_compute)
|
|
|
|
|
|
## License
|
|
|
|
|
|
MIT
|
|
|
|