threshold-parity / results.md
phanerozoic's picture
Rename from tiny-parity-verified
5d07688 verified

Test Results

This document reports comprehensive testing of the tiny-parity-prover threshold network across 30 test cases spanning basic functionality, analytical properties, and real-world applications.

Basic Functional Tests

All ten basic tests passed. The network correctly outputs 0 for all-zeros and all-ones inputs (both have even bit counts), and outputs 1 for all eight single-bit patterns. Exhaustive evaluation over all 256 possible 8-bit inputs confirms 100% accuracy against the ground-truth parity function. Batch inference produces identical results to sequential evaluation, and 1000 repeated inferences on the same input yield identical outputs, confirming determinism. Throughput measured at 18,161 inferences per second (55 microseconds per inference) on CPU.

Test Input Expected Result
All zeros [0,0,0,0,0,0,0,0] 0 0
All ones [1,1,1,1,1,1,1,1] 0 0
Single bit set 8 patterns 1 8/8
Two bits set 5 patterns 0 5/5
Exhaustive sweep 256 inputs 256/256
Batch inference 256 inputs 256/256
Determinism 1000 runs identical identical
Integer conversion 0–255 256/256
Random sampling 100 random 100/100
Latency 10,000 runs 18,161 inf/sec

Analytical Tests

Weight Pruning. The network contains 784 total weights, of which 526 are non-zero (32.9% natural sparsity). Random pruning experiments reveal sharp accuracy degradation: removing just 10% of weights drops accuracy to 138–191 out of 256, and pruning beyond 20% collapses performance to chance level (128/256). This suggests the network operates near its minimum viable parameter count despite apparent redundancy.

Pruned Accuracy
0% 256/256
10% 138–191/256
20% 128–130/256
30% 128–131/256
40% 128/256
50% 128/256

Neuron Ablation. Systematically disabling individual neurons reveals that only 11 of 32 Layer-1 neurons and 3 of 16 Layer-2 neurons are critical for correct operation. The remaining neurons are redundant—their removal does not affect accuracy. The most critical neuron is Layer-2 Neuron 8, whose ablation drops accuracy to 128/256 (chance). This asymmetry suggests the evolutionary search found a solution with substantial structural slack.

Layer Critical Redundant Most Critical
Layer 1 (32 neurons) 11 21 Neuron 11 → 162/256
Layer 2 (16 neurons) 3 13 Neuron 8 → 128/256

Symbolic Extraction. Each Layer-1 neuron computes a linear threshold function over a subset of input bits. For example, Neuron 0 fires when the count of bits 1, 2, 5, and 6 being set is sufficiently low (all weights are −1 with bias 0). Neuron 1 fires when bits 0, 6, 7 are set and bits 1, 2, 3, 4 are not, subject to bias −6. The network builds parity by combining these partial threshold computations through two subsequent layers.

Neuron 0: +bits[] −bits[1,2,5,6], bias=0
Neuron 1: +bits[0,6,7] −bits[1,2,3,4], bias=−6
Neuron 2: +bits[0,4,5,7] −bits[6], bias=−7
Neuron 3: +bits[0,2,4,5,6,7] −bits[1,3], bias=0

Permutation Invariance. Since parity depends only on the count of set bits, not their positions, the network should be invariant to input permutation. Testing 100 random permutations across all 256 inputs (25,600 total cases) confirms zero failures.

Noise Robustness. The network is extremely fragile to input noise. Adding Gaussian noise with standard deviation as small as 0.05 to binary inputs drops accuracy from 100% to approximately 51%—essentially chance. This is expected: Heaviside activation has a hard threshold at zero with no margin, so any perturbation that shifts a pre-activation across zero flips the output. This fragility is inherent to the architecture, not a defect.

Noise σ Accuracy
0.00 256/256 (100%)
0.05 131/256 (51%)
0.10 131/256 (51%)
0.15 129/256 (50%)
0.20 125/256 (49%)

Bias Sensitivity. Perturbing each bias by ±1 reveals that 9 of 32 Layer-1 biases, 1 of 16 Layer-2 biases, and the single output bias are sensitive (their perturbation causes at least one input to flip). The remaining biases have sufficient slack that small perturbations are absorbed.

Layer Sensitive Biases
Layer 1 9/32
Layer 2 1/16
Output 1/1

Layer Collapse. Composing Layer-1 and Layer-2 weight matrices produces a 16×8 matrix with values ranging from −9 to +12. However, this composition is invalid because the Heaviside activation between layers is non-linear. The 3-layer depth is necessary; there is no equivalent 2-layer network.

Minimum Network Search. Random initialization of smaller architectures across 100 trials per architecture yields best accuracies between 138 and 147 out of 256—far below 100%. This confirms that finding a correct solution requires directed search (evolutionary algorithms), not random sampling.

Architecture Best Accuracy
8→16→8→1 138/256
8→8→4→1 147/256
8→32→8→1 144/256
8→24→12→1 141/256

Coq Proof Generalization. Beyond the main correctness theorem, we proved an additional property in Coq: bit-flip invariance. The lemma negb_invariance_8bit establishes that parity(NOT x) = parity(x) for all 8-bit inputs, verified by exhaustive computation over all 256 cases. This follows from the fact that flipping all bits in an even-length vector preserves parity.

Lemma negb_invariance_8bit :
  forallb (fun xs => Bool.eqb (parity (map negb xs)) (parity xs))
          (all_inputs 8) = true.
Proof. vm_compute. reflexivity. Qed.

Hardware Synthesis. We generated a Verilog implementation of the threshold network. The resulting design requires approximately 500–1000 gates (49 threshold units, each requiring adders and a comparator). For comparison, a naive XOR tree computes parity in 7 gates—a 70–140× overhead. This cost is acceptable only when XOR gates are unavailable, as on neuromorphic hardware.

Implementation Gates
Threshold network ~500–1000
XOR tree 7
Overhead 70–140×

Real-World Application Tests

Memory Parity (ECC). Simulating 100 memory bytes with parity bits and injecting single-bit errors into half of them, the network correctly identifies all 100 cases: 50 valid bytes pass, 50 corrupted bytes are flagged. Zero false positives or negatives.

RAID-5 Reconstruction. Simulating a 4-disk RAID-5 stripe with one failed disk, the network reconstructs all 50 test bytes correctly by computing parity across the surviving disks.

UART Validation. Simulating serial communication frames with odd parity, the network validates all 50 correct frames and detects corruption in all 50 intentionally corrupted frames.

XOR Checksum. Computing a parity-based checksum over 100 bytes produces the correct result, and single-bit corruption is reliably detected by checksum mismatch.

LFSR Keystream. Using the network as the feedback function in a linear feedback shift register, starting from state [1,0,1,1,0,0,1,0], produces a deterministic 32-bit keystream 0x269349a4. Repeated runs yield identical output.

Barcode Validation. Using a 7-data-bit plus 1-parity-bit scheme, all 25 valid codes verify correctly, and all 25 corrupted codes are rejected.

Packet Checksum. A simplified XOR-parity packet checksum validates all 20 test packets correctly.

Hamming (7,4) Syndrome. Computing syndrome bits for Hamming code error detection, all 20 valid codewords produce syndrome 0 as expected.

PS/2 Keyboard Codes. Validating 20 standard keyboard scan codes against PS/2 odd-parity requirements, all frames pass.

DNA Codon Parity. Encoding nucleotides as 2-bit values (A=00, C=01, G=10, T=11) and computing parity over 6-bit codons, all 8 test sequences produce correct results.

Codon Bits Parity
ATG 001110 1
GCA 100100 0
TTA 111100 0
CGG 011010 1
AAA 000000 0
TTT 111111 0
GGG 101010 1
CCC 010101 1
Test Result
Memory Parity (ECC) 100/100
RAID-5 Reconstruction 50/50
UART Validation 100/100
XOR Checksum Pass
LFSR Keystream Deterministic
Barcode Validation 50/50
Packet Checksum 20/20
Hamming (7,4) 20/20
PS/2 Keyboard 20/20
DNA Codon Parity 8/8

Pruned Network

Based on the neuron ablation analysis (Test A2), we extracted a minimal subnetwork containing only the critical neurons. The pruned network achieves identical 100% accuracy with 83.3% fewer parameters.

Metric Full Network Pruned Network
Architecture 8 → 32 → 16 → 1 8 → 11 → 3 → 1
Layer 1 neurons 32 11
Layer 2 neurons 16 3
Total parameters 833 139
Accuracy 256/256 256/256

The critical neurons retained from the original network are:

  • Layer 1: indices [3, 7, 9, 11, 12, 16, 17, 20, 24, 27, 28]
  • Layer 2: indices [8, 11, 14]

The pruned network is formally verified in Coq (PrunedThresholdLogic.v) with identical theorems to the full network. Weights are available in the pruned/ folder.

Summary

All 30 tests pass. The network is functionally correct, deterministic, and applicable to standard parity-based protocols. Analytical tests reveal high neuron redundancy, extreme noise sensitivity, and the necessity of evolutionary search for training. The pruned variant demonstrates that the evolutionary search found a solution with substantial structural slack—only 14 of 48 hidden neurons are necessary for correct operation.

The primary practical application is on neuromorphic hardware platforms (Intel Loihi, IBM TrueNorth, BrainChip Akida) where threshold networks are the native compute primitive and XOR gates cannot be directly instantiated. The pruned network is preferable for deployment due to its reduced resource requirements.