Rename from tiny-parity-verified
Browse files- README.md +117 -0
- config.json +11 -0
- model.py +87 -0
- model.safetensors +3 -0
- pruned/README.md +27 -0
- pruned/config.json +13 -0
- pruned/model.py +78 -0
- pruned/model.safetensors +3 -0
- results.md +167 -0
- todo.md +44 -0
README.md
ADDED
|
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
tags:
|
| 4 |
+
- pytorch
|
| 5 |
+
- safetensors
|
| 6 |
+
- threshold-logic
|
| 7 |
+
- neuromorphic
|
| 8 |
+
- parity
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# threshold-parity
|
| 12 |
+
|
| 13 |
+
Computes 8-bit parity (XOR of all bits). A canonical hard function for threshold networks - parity requires depth.
|
| 14 |
+
|
| 15 |
+
## Circuit
|
| 16 |
+
|
| 17 |
+
```
|
| 18 |
+
x₀ x₁ x₂ x₃ x₄ x₅ x₆ x₇
|
| 19 |
+
│ │ │ │ │ │ │ │
|
| 20 |
+
└──┴──┴──┴──┼──┴──┴──┴──┘
|
| 21 |
+
▼
|
| 22 |
+
┌─────────────┐
|
| 23 |
+
│ Layer 1 │ 11 neurons (pruned)
|
| 24 |
+
│ ±1 weights │
|
| 25 |
+
└─────────────┘
|
| 26 |
+
│
|
| 27 |
+
▼
|
| 28 |
+
┌─────────────┐
|
| 29 |
+
│ Layer 2 │ 3 neurons
|
| 30 |
+
└─────────────┘
|
| 31 |
+
│
|
| 32 |
+
▼
|
| 33 |
+
┌─────────────┐
|
| 34 |
+
│ Output │
|
| 35 |
+
└─────────────┘
|
| 36 |
+
│
|
| 37 |
+
▼
|
| 38 |
+
{0, 1} = HW mod 2
|
| 39 |
+
```
|
| 40 |
+
|
| 41 |
+
## Why Is Parity Hard?
|
| 42 |
+
|
| 43 |
+
Parity is HW mod 2 - the simplest modular function. Yet it's notoriously difficult:
|
| 44 |
+
|
| 45 |
+
1. **Not linearly separable**: No single hyperplane separates odd from even HW
|
| 46 |
+
2. **Flat loss landscape**: Gradient descent fails because parity depends on a global property
|
| 47 |
+
3. **Requires depth**: Minimum 3 layers for threshold networks
|
| 48 |
+
|
| 49 |
+
This network was found via evolutionary search (10,542 generations), not gradient descent.
|
| 50 |
+
|
| 51 |
+
## Algebraic Insight
|
| 52 |
+
|
| 53 |
+
For any ±1 weight vector w, the dot product w·x has the same parity as HW(x):
|
| 54 |
+
|
| 55 |
+
```
|
| 56 |
+
(-1)^(w·x) = (-1)^HW(x)
|
| 57 |
+
```
|
| 58 |
+
|
| 59 |
+
This is because each +1 weight contributes its input's value, and each -1 weight contributes -(input) ≡ input (mod 2).
|
| 60 |
+
|
| 61 |
+
The network uses this to detect whether HW is even or odd.
|
| 62 |
+
|
| 63 |
+
## Architecture
|
| 64 |
+
|
| 65 |
+
| Variant | Architecture | Parameters |
|
| 66 |
+
|---------|--------------|------------|
|
| 67 |
+
| Full | 8 → 32 → 16 → 1 | 833 |
|
| 68 |
+
| Pruned | 8 → 11 → 3 → 1 | 139 |
|
| 69 |
+
|
| 70 |
+
The pruned variant keeps only the 14 critical neurons identified through ablation.
|
| 71 |
+
|
| 72 |
+
## How It Works
|
| 73 |
+
|
| 74 |
+
The pruned network has three L2 neurons:
|
| 75 |
+
- L2-N0: Always fires (large positive bias)
|
| 76 |
+
- L2-N1: Fires iff HW is even (parity discriminator)
|
| 77 |
+
- L2-N2: Always fires (large positive bias)
|
| 78 |
+
|
| 79 |
+
Output: `L2-N0 - L2-N1 + L2-N2 - 2 ≥ 0` = NOT(L2-N1)
|
| 80 |
+
|
| 81 |
+
Since L2-N1 fires when even, output fires when odd. That's parity.
|
| 82 |
+
|
| 83 |
+
## Usage
|
| 84 |
+
|
| 85 |
+
```python
|
| 86 |
+
from safetensors.torch import load_file
|
| 87 |
+
import torch
|
| 88 |
+
|
| 89 |
+
w = load_file('model.safetensors')
|
| 90 |
+
|
| 91 |
+
def forward(x):
|
| 92 |
+
x = x.float()
|
| 93 |
+
x = (x @ w['layer1.weight'].T + w['layer1.bias'] >= 0).float()
|
| 94 |
+
x = (x @ w['layer2.weight'].T + w['layer2.bias'] >= 0).float()
|
| 95 |
+
x = (x @ w['output.weight'].T + w['output.bias'] >= 0).float()
|
| 96 |
+
return x.squeeze(-1)
|
| 97 |
+
|
| 98 |
+
inp = torch.tensor([[1,0,1,1,0,0,1,0]]) # HW=4
|
| 99 |
+
print(int(forward(inp).item())) # 0 (even)
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
## Files
|
| 103 |
+
|
| 104 |
+
```
|
| 105 |
+
threshold-parity/
|
| 106 |
+
├── model.safetensors
|
| 107 |
+
├── model.py
|
| 108 |
+
├── config.json
|
| 109 |
+
├── README.md
|
| 110 |
+
└── pruned/
|
| 111 |
+
├── model.safetensors
|
| 112 |
+
└── ...
|
| 113 |
+
```
|
| 114 |
+
|
| 115 |
+
## License
|
| 116 |
+
|
| 117 |
+
MIT
|
config.json
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "threshold_network",
|
| 3 |
+
"n_bits": 8,
|
| 4 |
+
"hidden1": 32,
|
| 5 |
+
"hidden2": 16,
|
| 6 |
+
"output": 1,
|
| 7 |
+
"total_parameters": 833,
|
| 8 |
+
"weight_quantization": "ternary",
|
| 9 |
+
"activation": "heaviside",
|
| 10 |
+
"task": "parity"
|
| 11 |
+
}
|
model.py
ADDED
|
@@ -0,0 +1,87 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Threshold Network for Parity Computation
|
| 3 |
+
|
| 4 |
+
A ternary threshold network that computes the parity (XOR) of 8 binary inputs.
|
| 5 |
+
Weights are constrained to {-1, 0, +1} and activations use the Heaviside step function.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import json
|
| 9 |
+
import torch
|
| 10 |
+
import torch.nn as nn
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
class ThresholdNetwork(nn.Module):
|
| 14 |
+
"""
|
| 15 |
+
Binary threshold network with ternary weights.
|
| 16 |
+
|
| 17 |
+
Architecture: 8 -> 32 -> 16 -> 1
|
| 18 |
+
Weights: {-1, 0, +1}
|
| 19 |
+
Activation: Heaviside (x >= 0 -> 1, else 0)
|
| 20 |
+
"""
|
| 21 |
+
|
| 22 |
+
def __init__(self, n_bits=8, hidden1=32, hidden2=16):
|
| 23 |
+
super().__init__()
|
| 24 |
+
self.n_bits = n_bits
|
| 25 |
+
self.hidden1 = hidden1
|
| 26 |
+
self.hidden2 = hidden2
|
| 27 |
+
|
| 28 |
+
self.layer1_weight = nn.Parameter(torch.zeros(hidden1, n_bits))
|
| 29 |
+
self.layer1_bias = nn.Parameter(torch.zeros(hidden1))
|
| 30 |
+
self.layer2_weight = nn.Parameter(torch.zeros(hidden2, hidden1))
|
| 31 |
+
self.layer2_bias = nn.Parameter(torch.zeros(hidden2))
|
| 32 |
+
self.output_weight = nn.Parameter(torch.zeros(1, hidden2))
|
| 33 |
+
self.output_bias = nn.Parameter(torch.zeros(1))
|
| 34 |
+
|
| 35 |
+
def forward(self, x):
|
| 36 |
+
"""Forward pass with Heaviside activation."""
|
| 37 |
+
x = x.float()
|
| 38 |
+
x = (torch.nn.functional.linear(x, self.layer1_weight, self.layer1_bias) >= 0).float()
|
| 39 |
+
x = (torch.nn.functional.linear(x, self.layer2_weight, self.layer2_bias) >= 0).float()
|
| 40 |
+
x = (torch.nn.functional.linear(x, self.output_weight, self.output_bias) >= 0).float()
|
| 41 |
+
return x.squeeze(-1)
|
| 42 |
+
|
| 43 |
+
@classmethod
|
| 44 |
+
def from_safetensors(cls, path):
|
| 45 |
+
"""Load model from SafeTensors file."""
|
| 46 |
+
from safetensors.torch import load_file
|
| 47 |
+
|
| 48 |
+
weights = load_file(path)
|
| 49 |
+
|
| 50 |
+
hidden1 = weights['layer1.weight'].shape[0]
|
| 51 |
+
hidden2 = weights['layer2.weight'].shape[0]
|
| 52 |
+
n_bits = weights['layer1.weight'].shape[1]
|
| 53 |
+
|
| 54 |
+
model = cls(n_bits=n_bits, hidden1=hidden1, hidden2=hidden2)
|
| 55 |
+
model.layer1_weight.data = weights['layer1.weight']
|
| 56 |
+
model.layer1_bias.data = weights['layer1.bias']
|
| 57 |
+
model.layer2_weight.data = weights['layer2.weight']
|
| 58 |
+
model.layer2_bias.data = weights['layer2.bias']
|
| 59 |
+
model.output_weight.data = weights['output.weight']
|
| 60 |
+
model.output_bias.data = weights['output.bias']
|
| 61 |
+
|
| 62 |
+
return model
|
| 63 |
+
|
| 64 |
+
|
| 65 |
+
def parity(x):
|
| 66 |
+
"""Ground truth parity function."""
|
| 67 |
+
return (x.sum(dim=-1) % 2).float()
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
if __name__ == '__main__':
|
| 71 |
+
model = ThresholdNetwork.from_safetensors('model.safetensors')
|
| 72 |
+
|
| 73 |
+
test_inputs = torch.tensor([
|
| 74 |
+
[0, 0, 0, 0, 0, 0, 0, 0],
|
| 75 |
+
[1, 0, 0, 0, 0, 0, 0, 0],
|
| 76 |
+
[1, 1, 0, 0, 0, 0, 0, 0],
|
| 77 |
+
[1, 1, 1, 0, 0, 0, 0, 0],
|
| 78 |
+
[1, 1, 1, 1, 1, 1, 1, 1],
|
| 79 |
+
], dtype=torch.float32)
|
| 80 |
+
|
| 81 |
+
outputs = model(test_inputs)
|
| 82 |
+
expected = parity(test_inputs)
|
| 83 |
+
|
| 84 |
+
print("Input -> Output (Expected)")
|
| 85 |
+
for i in range(len(test_inputs)):
|
| 86 |
+
bits = test_inputs[i].int().tolist()
|
| 87 |
+
print(f"{bits} -> {int(outputs[i].item())} ({int(expected[i].item())})")
|
model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:8a4f4a10ee1fd3f85c6bee53cfc52297e1acca03a6d80fad2357a53582ed1226
|
| 3 |
+
size 3772
|
pruned/README.md
ADDED
|
@@ -0,0 +1,27 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Pruned Variant
|
| 2 |
+
|
| 3 |
+
Minimal subnetwork extracted from the full model via neuron ablation analysis.
|
| 4 |
+
|
| 5 |
+
## Architecture
|
| 6 |
+
|
| 7 |
+
| Layer | Full | Pruned |
|
| 8 |
+
|-------|------|--------|
|
| 9 |
+
| Hidden 1 | 32 | 11 |
|
| 10 |
+
| Hidden 2 | 16 | 3 |
|
| 11 |
+
| Parameters | 833 | 139 |
|
| 12 |
+
|
| 13 |
+
## Pruning Method
|
| 14 |
+
|
| 15 |
+
Systematic single-neuron ablation identified which neurons are critical (their removal degrades accuracy) versus redundant (their removal has no effect). Only 11 of 32 Layer-1 neurons and 3 of 16 Layer-2 neurons were critical. The minimal subnetwork retains only these neurons.
|
| 16 |
+
|
| 17 |
+
## Verification
|
| 18 |
+
|
| 19 |
+
Correctness is proven in Coq for all 256 inputs, identical to the full network. See `PrunedThresholdLogic.v` in the [GitHub repository](https://github.com/CharlesCNorton/threshold-logic-verified).
|
| 20 |
+
|
| 21 |
+
## Usage
|
| 22 |
+
|
| 23 |
+
```python
|
| 24 |
+
from model import PrunedThresholdNetwork
|
| 25 |
+
|
| 26 |
+
model = PrunedThresholdNetwork.from_safetensors('model.safetensors')
|
| 27 |
+
```
|
pruned/config.json
ADDED
|
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"model_type": "threshold_network",
|
| 3 |
+
"n_bits": 8,
|
| 4 |
+
"hidden1": 11,
|
| 5 |
+
"hidden2": 3,
|
| 6 |
+
"output": 1,
|
| 7 |
+
"total_parameters": 139,
|
| 8 |
+
"weight_quantization": "ternary",
|
| 9 |
+
"activation": "heaviside",
|
| 10 |
+
"task": "parity",
|
| 11 |
+
"pruned_from": "8->32->16->1",
|
| 12 |
+
"reduction": "83.3%"
|
| 13 |
+
}
|
pruned/model.py
ADDED
|
@@ -0,0 +1,78 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Pruned Threshold Network for Parity Computation
|
| 3 |
+
|
| 4 |
+
A minimal ternary threshold network (8->11->3->1) that computes 8-bit parity.
|
| 5 |
+
Pruned from the original 8->32->16->1 architecture with 83.3% parameter reduction.
|
| 6 |
+
"""
|
| 7 |
+
|
| 8 |
+
import torch
|
| 9 |
+
import torch.nn as nn
|
| 10 |
+
|
| 11 |
+
|
| 12 |
+
class PrunedThresholdNetwork(nn.Module):
|
| 13 |
+
"""
|
| 14 |
+
Pruned binary threshold network with ternary weights.
|
| 15 |
+
|
| 16 |
+
Architecture: 8 -> 11 -> 3 -> 1
|
| 17 |
+
Weights: {-1, 0, +1}
|
| 18 |
+
Activation: Heaviside (x >= 0 -> 1, else 0)
|
| 19 |
+
"""
|
| 20 |
+
|
| 21 |
+
def __init__(self, n_bits=8, hidden1=11, hidden2=3):
|
| 22 |
+
super().__init__()
|
| 23 |
+
self.n_bits = n_bits
|
| 24 |
+
self.hidden1 = hidden1
|
| 25 |
+
self.hidden2 = hidden2
|
| 26 |
+
|
| 27 |
+
self.layer1_weight = nn.Parameter(torch.zeros(hidden1, n_bits))
|
| 28 |
+
self.layer1_bias = nn.Parameter(torch.zeros(hidden1))
|
| 29 |
+
self.layer2_weight = nn.Parameter(torch.zeros(hidden2, hidden1))
|
| 30 |
+
self.layer2_bias = nn.Parameter(torch.zeros(hidden2))
|
| 31 |
+
self.output_weight = nn.Parameter(torch.zeros(1, hidden2))
|
| 32 |
+
self.output_bias = nn.Parameter(torch.zeros(1))
|
| 33 |
+
|
| 34 |
+
def forward(self, x):
|
| 35 |
+
"""Forward pass with Heaviside activation."""
|
| 36 |
+
x = x.float()
|
| 37 |
+
x = (torch.nn.functional.linear(x, self.layer1_weight, self.layer1_bias) >= 0).float()
|
| 38 |
+
x = (torch.nn.functional.linear(x, self.layer2_weight, self.layer2_bias) >= 0).float()
|
| 39 |
+
x = (torch.nn.functional.linear(x, self.output_weight, self.output_bias) >= 0).float()
|
| 40 |
+
return x.squeeze(-1)
|
| 41 |
+
|
| 42 |
+
@classmethod
|
| 43 |
+
def from_safetensors(cls, path):
|
| 44 |
+
"""Load model from SafeTensors file."""
|
| 45 |
+
from safetensors.torch import load_file
|
| 46 |
+
|
| 47 |
+
weights = load_file(path)
|
| 48 |
+
|
| 49 |
+
hidden1 = weights['layer1.weight'].shape[0]
|
| 50 |
+
hidden2 = weights['layer2.weight'].shape[0]
|
| 51 |
+
n_bits = weights['layer1.weight'].shape[1]
|
| 52 |
+
|
| 53 |
+
model = cls(n_bits=n_bits, hidden1=hidden1, hidden2=hidden2)
|
| 54 |
+
model.layer1_weight.data = weights['layer1.weight']
|
| 55 |
+
model.layer1_bias.data = weights['layer1.bias']
|
| 56 |
+
model.layer2_weight.data = weights['layer2.weight']
|
| 57 |
+
model.layer2_bias.data = weights['layer2.bias']
|
| 58 |
+
model.output_weight.data = weights['output.weight']
|
| 59 |
+
model.output_bias.data = weights['output.bias']
|
| 60 |
+
|
| 61 |
+
return model
|
| 62 |
+
|
| 63 |
+
|
| 64 |
+
def parity(x):
|
| 65 |
+
"""Ground truth parity function."""
|
| 66 |
+
return (x.sum(dim=-1) % 2).float()
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
if __name__ == '__main__':
|
| 70 |
+
model = PrunedThresholdNetwork.from_safetensors('model.safetensors')
|
| 71 |
+
|
| 72 |
+
# Test all 256 inputs
|
| 73 |
+
all_inputs = torch.tensor([[int(b) for b in format(i, '08b')] for i in range(256)], dtype=torch.float32)
|
| 74 |
+
outputs = model(all_inputs)
|
| 75 |
+
expected = parity(all_inputs)
|
| 76 |
+
|
| 77 |
+
correct = (outputs == expected).sum().item()
|
| 78 |
+
print(f'Accuracy: {correct}/256 ({100*correct/256:.1f}%)')
|
pruned/model.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:161e13ec2919875e2612b4441be0ce8208c694ec6df3a6685f61eb9051924d9f
|
| 3 |
+
size 980
|
results.md
ADDED
|
@@ -0,0 +1,167 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Test Results
|
| 2 |
+
|
| 3 |
+
This document reports comprehensive testing of the `tiny-parity-prover` threshold network across 30 test cases spanning basic functionality, analytical properties, and real-world applications.
|
| 4 |
+
|
| 5 |
+
## Basic Functional Tests
|
| 6 |
+
|
| 7 |
+
All ten basic tests passed. The network correctly outputs 0 for all-zeros and all-ones inputs (both have even bit counts), and outputs 1 for all eight single-bit patterns. Exhaustive evaluation over all 256 possible 8-bit inputs confirms 100% accuracy against the ground-truth parity function. Batch inference produces identical results to sequential evaluation, and 1000 repeated inferences on the same input yield identical outputs, confirming determinism. Throughput measured at 18,161 inferences per second (55 microseconds per inference) on CPU.
|
| 8 |
+
|
| 9 |
+
| Test | Input | Expected | Result |
|
| 10 |
+
|------|-------|----------|--------|
|
| 11 |
+
| All zeros | `[0,0,0,0,0,0,0,0]` | 0 | 0 |
|
| 12 |
+
| All ones | `[1,1,1,1,1,1,1,1]` | 0 | 0 |
|
| 13 |
+
| Single bit set | 8 patterns | 1 | 8/8 |
|
| 14 |
+
| Two bits set | 5 patterns | 0 | 5/5 |
|
| 15 |
+
| Exhaustive sweep | 256 inputs | — | 256/256 |
|
| 16 |
+
| Batch inference | 256 inputs | — | 256/256 |
|
| 17 |
+
| Determinism | 1000 runs | identical | identical |
|
| 18 |
+
| Integer conversion | 0–255 | — | 256/256 |
|
| 19 |
+
| Random sampling | 100 random | — | 100/100 |
|
| 20 |
+
| Latency | 10,000 runs | — | 18,161 inf/sec |
|
| 21 |
+
|
| 22 |
+
## Analytical Tests
|
| 23 |
+
|
| 24 |
+
**Weight Pruning.** The network contains 784 total weights, of which 526 are non-zero (32.9% natural sparsity). Random pruning experiments reveal sharp accuracy degradation: removing just 10% of weights drops accuracy to 138–191 out of 256, and pruning beyond 20% collapses performance to chance level (128/256). This suggests the network operates near its minimum viable parameter count despite apparent redundancy.
|
| 25 |
+
|
| 26 |
+
| Pruned | Accuracy |
|
| 27 |
+
|--------|----------|
|
| 28 |
+
| 0% | 256/256 |
|
| 29 |
+
| 10% | 138–191/256 |
|
| 30 |
+
| 20% | 128–130/256 |
|
| 31 |
+
| 30% | 128–131/256 |
|
| 32 |
+
| 40% | 128/256 |
|
| 33 |
+
| 50% | 128/256 |
|
| 34 |
+
|
| 35 |
+
**Neuron Ablation.** Systematically disabling individual neurons reveals that only 11 of 32 Layer-1 neurons and 3 of 16 Layer-2 neurons are critical for correct operation. The remaining neurons are redundant—their removal does not affect accuracy. The most critical neuron is Layer-2 Neuron 8, whose ablation drops accuracy to 128/256 (chance). This asymmetry suggests the evolutionary search found a solution with substantial structural slack.
|
| 36 |
+
|
| 37 |
+
| Layer | Critical | Redundant | Most Critical |
|
| 38 |
+
|-------|----------|-----------|---------------|
|
| 39 |
+
| Layer 1 (32 neurons) | 11 | 21 | Neuron 11 → 162/256 |
|
| 40 |
+
| Layer 2 (16 neurons) | 3 | 13 | Neuron 8 → 128/256 |
|
| 41 |
+
|
| 42 |
+
**Symbolic Extraction.** Each Layer-1 neuron computes a linear threshold function over a subset of input bits. For example, Neuron 0 fires when the count of bits 1, 2, 5, and 6 being set is sufficiently low (all weights are −1 with bias 0). Neuron 1 fires when bits 0, 6, 7 are set and bits 1, 2, 3, 4 are not, subject to bias −6. The network builds parity by combining these partial threshold computations through two subsequent layers.
|
| 43 |
+
|
| 44 |
+
```
|
| 45 |
+
Neuron 0: +bits[] −bits[1,2,5,6], bias=0
|
| 46 |
+
Neuron 1: +bits[0,6,7] −bits[1,2,3,4], bias=−6
|
| 47 |
+
Neuron 2: +bits[0,4,5,7] −bits[6], bias=−7
|
| 48 |
+
Neuron 3: +bits[0,2,4,5,6,7] −bits[1,3], bias=0
|
| 49 |
+
```
|
| 50 |
+
|
| 51 |
+
**Permutation Invariance.** Since parity depends only on the count of set bits, not their positions, the network should be invariant to input permutation. Testing 100 random permutations across all 256 inputs (25,600 total cases) confirms zero failures.
|
| 52 |
+
|
| 53 |
+
**Noise Robustness.** The network is extremely fragile to input noise. Adding Gaussian noise with standard deviation as small as 0.05 to binary inputs drops accuracy from 100% to approximately 51%—essentially chance. This is expected: Heaviside activation has a hard threshold at zero with no margin, so any perturbation that shifts a pre-activation across zero flips the output. This fragility is inherent to the architecture, not a defect.
|
| 54 |
+
|
| 55 |
+
| Noise σ | Accuracy |
|
| 56 |
+
|---------|----------|
|
| 57 |
+
| 0.00 | 256/256 (100%) |
|
| 58 |
+
| 0.05 | 131/256 (51%) |
|
| 59 |
+
| 0.10 | 131/256 (51%) |
|
| 60 |
+
| 0.15 | 129/256 (50%) |
|
| 61 |
+
| 0.20 | 125/256 (49%) |
|
| 62 |
+
|
| 63 |
+
**Bias Sensitivity.** Perturbing each bias by ±1 reveals that 9 of 32 Layer-1 biases, 1 of 16 Layer-2 biases, and the single output bias are sensitive (their perturbation causes at least one input to flip). The remaining biases have sufficient slack that small perturbations are absorbed.
|
| 64 |
+
|
| 65 |
+
| Layer | Sensitive Biases |
|
| 66 |
+
|-------|------------------|
|
| 67 |
+
| Layer 1 | 9/32 |
|
| 68 |
+
| Layer 2 | 1/16 |
|
| 69 |
+
| Output | 1/1 |
|
| 70 |
+
|
| 71 |
+
**Layer Collapse.** Composing Layer-1 and Layer-2 weight matrices produces a 16×8 matrix with values ranging from −9 to +12. However, this composition is invalid because the Heaviside activation between layers is non-linear. The 3-layer depth is necessary; there is no equivalent 2-layer network.
|
| 72 |
+
|
| 73 |
+
**Minimum Network Search.** Random initialization of smaller architectures across 100 trials per architecture yields best accuracies between 138 and 147 out of 256—far below 100%. This confirms that finding a correct solution requires directed search (evolutionary algorithms), not random sampling.
|
| 74 |
+
|
| 75 |
+
| Architecture | Best Accuracy |
|
| 76 |
+
|--------------|---------------|
|
| 77 |
+
| 8→16→8→1 | 138/256 |
|
| 78 |
+
| 8→8→4→1 | 147/256 |
|
| 79 |
+
| 8→32→8→1 | 144/256 |
|
| 80 |
+
| 8→24→12→1 | 141/256 |
|
| 81 |
+
|
| 82 |
+
**Coq Proof Generalization.** Beyond the main correctness theorem, we proved an additional property in Coq: bit-flip invariance. The lemma `negb_invariance_8bit` establishes that `parity(NOT x) = parity(x)` for all 8-bit inputs, verified by exhaustive computation over all 256 cases. This follows from the fact that flipping all bits in an even-length vector preserves parity.
|
| 83 |
+
|
| 84 |
+
```coq
|
| 85 |
+
Lemma negb_invariance_8bit :
|
| 86 |
+
forallb (fun xs => Bool.eqb (parity (map negb xs)) (parity xs))
|
| 87 |
+
(all_inputs 8) = true.
|
| 88 |
+
Proof. vm_compute. reflexivity. Qed.
|
| 89 |
+
```
|
| 90 |
+
|
| 91 |
+
**Hardware Synthesis.** We generated a Verilog implementation of the threshold network. The resulting design requires approximately 500–1000 gates (49 threshold units, each requiring adders and a comparator). For comparison, a naive XOR tree computes parity in 7 gates—a 70–140× overhead. This cost is acceptable only when XOR gates are unavailable, as on neuromorphic hardware.
|
| 92 |
+
|
| 93 |
+
| Implementation | Gates |
|
| 94 |
+
|----------------|-------|
|
| 95 |
+
| Threshold network | ~500–1000 |
|
| 96 |
+
| XOR tree | 7 |
|
| 97 |
+
| Overhead | 70–140× |
|
| 98 |
+
|
| 99 |
+
## Real-World Application Tests
|
| 100 |
+
|
| 101 |
+
**Memory Parity (ECC).** Simulating 100 memory bytes with parity bits and injecting single-bit errors into half of them, the network correctly identifies all 100 cases: 50 valid bytes pass, 50 corrupted bytes are flagged. Zero false positives or negatives.
|
| 102 |
+
|
| 103 |
+
**RAID-5 Reconstruction.** Simulating a 4-disk RAID-5 stripe with one failed disk, the network reconstructs all 50 test bytes correctly by computing parity across the surviving disks.
|
| 104 |
+
|
| 105 |
+
**UART Validation.** Simulating serial communication frames with odd parity, the network validates all 50 correct frames and detects corruption in all 50 intentionally corrupted frames.
|
| 106 |
+
|
| 107 |
+
**XOR Checksum.** Computing a parity-based checksum over 100 bytes produces the correct result, and single-bit corruption is reliably detected by checksum mismatch.
|
| 108 |
+
|
| 109 |
+
**LFSR Keystream.** Using the network as the feedback function in a linear feedback shift register, starting from state `[1,0,1,1,0,0,1,0]`, produces a deterministic 32-bit keystream `0x269349a4`. Repeated runs yield identical output.
|
| 110 |
+
|
| 111 |
+
**Barcode Validation.** Using a 7-data-bit plus 1-parity-bit scheme, all 25 valid codes verify correctly, and all 25 corrupted codes are rejected.
|
| 112 |
+
|
| 113 |
+
**Packet Checksum.** A simplified XOR-parity packet checksum validates all 20 test packets correctly.
|
| 114 |
+
|
| 115 |
+
**Hamming (7,4) Syndrome.** Computing syndrome bits for Hamming code error detection, all 20 valid codewords produce syndrome 0 as expected.
|
| 116 |
+
|
| 117 |
+
**PS/2 Keyboard Codes.** Validating 20 standard keyboard scan codes against PS/2 odd-parity requirements, all frames pass.
|
| 118 |
+
|
| 119 |
+
**DNA Codon Parity.** Encoding nucleotides as 2-bit values (A=00, C=01, G=10, T=11) and computing parity over 6-bit codons, all 8 test sequences produce correct results.
|
| 120 |
+
|
| 121 |
+
| Codon | Bits | Parity |
|
| 122 |
+
|-------|------|--------|
|
| 123 |
+
| ATG | 001110 | 1 |
|
| 124 |
+
| GCA | 100100 | 0 |
|
| 125 |
+
| TTA | 111100 | 0 |
|
| 126 |
+
| CGG | 011010 | 1 |
|
| 127 |
+
| AAA | 000000 | 0 |
|
| 128 |
+
| TTT | 111111 | 0 |
|
| 129 |
+
| GGG | 101010 | 1 |
|
| 130 |
+
| CCC | 010101 | 1 |
|
| 131 |
+
|
| 132 |
+
| Test | Result |
|
| 133 |
+
|------|--------|
|
| 134 |
+
| Memory Parity (ECC) | 100/100 |
|
| 135 |
+
| RAID-5 Reconstruction | 50/50 |
|
| 136 |
+
| UART Validation | 100/100 |
|
| 137 |
+
| XOR Checksum | Pass |
|
| 138 |
+
| LFSR Keystream | Deterministic |
|
| 139 |
+
| Barcode Validation | 50/50 |
|
| 140 |
+
| Packet Checksum | 20/20 |
|
| 141 |
+
| Hamming (7,4) | 20/20 |
|
| 142 |
+
| PS/2 Keyboard | 20/20 |
|
| 143 |
+
| DNA Codon Parity | 8/8 |
|
| 144 |
+
|
| 145 |
+
## Pruned Network
|
| 146 |
+
|
| 147 |
+
Based on the neuron ablation analysis (Test A2), we extracted a minimal subnetwork containing only the critical neurons. The pruned network achieves identical 100% accuracy with 83.3% fewer parameters.
|
| 148 |
+
|
| 149 |
+
| Metric | Full Network | Pruned Network |
|
| 150 |
+
|--------|--------------|----------------|
|
| 151 |
+
| Architecture | 8 → 32 → 16 → 1 | 8 → 11 → 3 → 1 |
|
| 152 |
+
| Layer 1 neurons | 32 | 11 |
|
| 153 |
+
| Layer 2 neurons | 16 | 3 |
|
| 154 |
+
| Total parameters | 833 | 139 |
|
| 155 |
+
| Accuracy | 256/256 | 256/256 |
|
| 156 |
+
|
| 157 |
+
The critical neurons retained from the original network are:
|
| 158 |
+
- Layer 1: indices [3, 7, 9, 11, 12, 16, 17, 20, 24, 27, 28]
|
| 159 |
+
- Layer 2: indices [8, 11, 14]
|
| 160 |
+
|
| 161 |
+
The pruned network is formally verified in Coq (`PrunedThresholdLogic.v`) with identical theorems to the full network. Weights are available in the `pruned/` folder.
|
| 162 |
+
|
| 163 |
+
## Summary
|
| 164 |
+
|
| 165 |
+
All 30 tests pass. The network is functionally correct, deterministic, and applicable to standard parity-based protocols. Analytical tests reveal high neuron redundancy, extreme noise sensitivity, and the necessity of evolutionary search for training. The pruned variant demonstrates that the evolutionary search found a solution with substantial structural slack—only 14 of 48 hidden neurons are necessary for correct operation.
|
| 166 |
+
|
| 167 |
+
The primary practical application is on neuromorphic hardware platforms (Intel Loihi, IBM TrueNorth, BrainChip Akida) where threshold networks are the native compute primitive and XOR gates cannot be directly instantiated. The pruned network is preferable for deployment due to its reduced resource requirements.
|
todo.md
ADDED
|
@@ -0,0 +1,44 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Future Work
|
| 2 |
+
|
| 3 |
+
## 1. Different function
|
| 4 |
+
|
| 5 |
+
Apply the train→export→verify pipeline to problems beyond parity:
|
| 6 |
+
|
| 7 |
+
- **Majority vote**: output 1 if more than half of inputs are 1
|
| 8 |
+
- **Addition carry**: output the carry bit of two 4-bit numbers
|
| 9 |
+
- **Threshold-k**: output 1 if at least k inputs are 1
|
| 10 |
+
- **Simple classifiers**: functions where the optimal circuit structure is not obvious
|
| 11 |
+
|
| 12 |
+
These would test whether the methodology generalizes beyond symmetric Boolean functions.
|
| 13 |
+
|
| 14 |
+
## 2. Neuromorphic deployment
|
| 15 |
+
|
| 16 |
+
Deploy the network on actual neuromorphic hardware (Intel Loihi, IBM TrueNorth, BrainChip Akida) or a cycle-accurate simulator. Measure power consumption, latency, and resource utilization. Validate that the claimed use case—formally verified parity on hardware that cannot implement XOR gates—is practical.
|
| 17 |
+
|
| 18 |
+
## 3. Paper
|
| 19 |
+
|
| 20 |
+
Write up the methodology for publication. Key contributions:
|
| 21 |
+
|
| 22 |
+
- Evolutionary search for threshold networks on gradient-hostile loss landscapes
|
| 23 |
+
- Ternary weight quantization enabling exhaustive Coq verification
|
| 24 |
+
- Neuron ablation analysis and principled pruning
|
| 25 |
+
- End-to-end pipeline from training to formal proof
|
| 26 |
+
|
| 27 |
+
Target venues: CAV, ICML, NeurIPS, or a formal methods workshop.
|
| 28 |
+
|
| 29 |
+
## 4. Tooling
|
| 30 |
+
|
| 31 |
+
Package the pipeline into a unified CLI:
|
| 32 |
+
|
| 33 |
+
```
|
| 34 |
+
verified-nn --function parity --bits 8 --output model/
|
| 35 |
+
```
|
| 36 |
+
|
| 37 |
+
The tool would:
|
| 38 |
+
1. Train via evolutionary search until 100% accuracy
|
| 39 |
+
2. Export weights to Coq
|
| 40 |
+
3. Compile and verify proofs
|
| 41 |
+
4. Optionally run ablation and pruning
|
| 42 |
+
5. Output SafeTensors + verified Coq proofs
|
| 43 |
+
|
| 44 |
+
This would lower the barrier to creating new verified networks.
|