|
|
---
|
|
|
license: mit
|
|
|
tags:
|
|
|
- pytorch
|
|
|
- safetensors
|
|
|
- formal-verification
|
|
|
- coq
|
|
|
- mod3
|
|
|
- modular-arithmetic
|
|
|
- threshold-network
|
|
|
- neuromorphic
|
|
|
---
|
|
|
|
|
|
# tiny-mod3-prover
|
|
|
|
|
|
Formally verified neural network that computes the MOD-3 function (Hamming weight mod 3) on 8-bit inputs. This repository contains the model artifacts; for proof development and Coq source code, see [mod3-verified](https://github.com/CharlesCNorton/mod3-verified).
|
|
|
|
|
|
## Overview
|
|
|
|
|
|
This is a threshold network that computes `mod3(x) = HW(x) mod 3` for 8-bit binary inputs, where HW denotes Hamming weight (number of set bits). The network outputs 0, 1, or 2 corresponding to the three residue classes.
|
|
|
|
|
|
**Key properties:**
|
|
|
- 100% accuracy on all 256 possible inputs
|
|
|
- Correctness proven in Coq via constructive algebraic proof
|
|
|
- Weights constrained to integers (many ternary)
|
|
|
- Heaviside step activation (x β₯ 0 β 1, else 0)
|
|
|
- First formally verified threshold circuit for MOD-m where m > 2
|
|
|
|
|
|
## Architecture
|
|
|
|
|
|
| Layer | Neurons | Function |
|
|
|
|-------|---------|----------|
|
|
|
| Input | 8 | Binary input bits |
|
|
|
| Hidden 1 | 9 | Thermometer encoding (HW β₯ k) |
|
|
|
| Hidden 2 | 2 | MOD-3 detection |
|
|
|
| Output | 3 | Classification (one-hot) |
|
|
|
|
|
|
**Total: 14 neurons, 110 parameters**
|
|
|
|
|
|
## Quick Start
|
|
|
|
|
|
```python
|
|
|
import torch
|
|
|
from safetensors.torch import load_file
|
|
|
|
|
|
# Load weights
|
|
|
weights = load_file('model.safetensors')
|
|
|
|
|
|
# Manual forward pass (Heaviside activation)
|
|
|
def forward(x, weights):
|
|
|
x = x.float()
|
|
|
x = (x @ weights['layer1.weight'].T + weights['layer1.bias'] >= 0).float()
|
|
|
x = (x @ weights['layer2.weight'].T + weights['layer2.bias'] >= 0).float()
|
|
|
out = x @ weights['output.weight'].T + weights['output.bias']
|
|
|
return out.argmax(dim=-1)
|
|
|
|
|
|
# Test
|
|
|
inputs = torch.tensor([[1, 0, 1, 1, 0, 0, 1, 0]], dtype=torch.float32)
|
|
|
output = forward(inputs, weights)
|
|
|
print(f"MOD-3 of [1,0,1,1,0,0,1,0]: {output.item()}") # 1 (4 bits set, 4 mod 3 = 1)
|
|
|
```
|
|
|
|
|
|
## Weight Structure
|
|
|
|
|
|
| Tensor | Shape | Values | Description |
|
|
|
|--------|-------|--------|-------------|
|
|
|
| `layer1.weight` | [9, 8] | All 1s | Thermometer encoding |
|
|
|
| `layer1.bias` | [9] | [0, -1, ..., -8] | Threshold at HW β₯ k |
|
|
|
| `layer2.weight` | [2, 9] | [0,1,1,-2,1,1,-2,1,1] | MOD-3 detection |
|
|
|
| `layer2.bias` | [2] | [-1, -2] | Class thresholds |
|
|
|
| `output.weight` | [3, 2] | Various | Classification |
|
|
|
| `output.bias` | [3] | [0, -1, -1] | Output thresholds |
|
|
|
|
|
|
## Algebraic Insight
|
|
|
|
|
|
For parity (MOD-2), the key insight was that Β±1 dot products preserve Hamming weight parity because (-1) β‘ 1 (mod 2).
|
|
|
|
|
|
For MOD-3, the insight is different. Using weights `(1, 1, -2)` repeated on the thermometer encoding produces partial sums that cycle through `(0, 1, 2, 0, 1, 2, ...)`:
|
|
|
|
|
|
```
|
|
|
HW=0: cumsum = 0 β 0 mod 3
|
|
|
HW=1: cumsum = 1 β 1 mod 3
|
|
|
HW=2: cumsum = 2 β 2 mod 3
|
|
|
HW=3: cumsum = 0 β 0 mod 3 (reset: 1+1-2=0)
|
|
|
HW=4: cumsum = 1 β 1 mod 3
|
|
|
HW=5: cumsum = 2 β 2 mod 3
|
|
|
HW=6: cumsum = 0 β 0 mod 3 (reset)
|
|
|
HW=7: cumsum = 1 β 1 mod 3
|
|
|
HW=8: cumsum = 2 β 2 mod 3
|
|
|
```
|
|
|
|
|
|
The pattern `1 + 1 + (-2) = 0` causes the cumulative sum to reset every 3 steps, tracking HW mod 3.
|
|
|
|
|
|
This generalizes to MOD-m: use weights `(1, 1, ..., 1, 1-m)` with `m-1` ones before the `1-m` term.
|
|
|
|
|
|
## Formal Verification
|
|
|
|
|
|
The network is proven correct in the Coq proof assistant with three independent proofs:
|
|
|
|
|
|
**1. Exhaustive verification:**
|
|
|
```coq
|
|
|
Theorem network_correct_exhaustive : verify_all = true.
|
|
|
Proof. vm_compute. reflexivity. Qed.
|
|
|
```
|
|
|
|
|
|
**2. Constructive verification (case analysis):**
|
|
|
```coq
|
|
|
Theorem network_correct_constructive : forall x0 x1 x2 x3 x4 x5 x6 x7,
|
|
|
predict [x0; x1; x2; x3; x4; x5; x6; x7] =
|
|
|
mod3 [x0; x1; x2; x3; x4; x5; x6; x7].
|
|
|
```
|
|
|
|
|
|
**3. Algebraic verification:**
|
|
|
```coq
|
|
|
Theorem cumsum_eq_mod3 : forall k,
|
|
|
(k <= 8)%nat -> cumsum k = Z.of_nat (Nat.modulo k 3).
|
|
|
|
|
|
Theorem network_algebraic_correct : forall h,
|
|
|
(h <= 8)%nat ->
|
|
|
classify (Z.geb (cumsum h - 1) 0) (Z.geb (cumsum h - 2) 0) = Nat.modulo h 3.
|
|
|
```
|
|
|
|
|
|
All proofs are axiom-free ("Closed under the global context").
|
|
|
|
|
|
## MOD-3 Distribution
|
|
|
|
|
|
For 8-bit inputs (256 total):
|
|
|
|
|
|
| Class | Count | Hamming Weights |
|
|
|
|-------|-------|-----------------|
|
|
|
| 0 | 85 | 0, 3, 6 |
|
|
|
| 1 | 86 | 1, 4, 7 |
|
|
|
| 2 | 85 | 2, 5, 8 |
|
|
|
|
|
|
## Training
|
|
|
|
|
|
The parametric construction was derived algebraically, not discovered through training.
|
|
|
|
|
|
Evolutionary search was attempted (as with parity) but consistently plateaued at 247/256 (96.5%) accuracy across multiple seeds. We were unable to discover the (1,1,-2) weight pattern through training, so we used algebraic construction instead.
|
|
|
|
|
|
## Comparison to Parity
|
|
|
|
|
|
| Property | Parity (MOD-2) | MOD-3 |
|
|
|
|----------|----------------|-------|
|
|
|
| Output classes | 2 | 3 |
|
|
|
| Key weights | Β±1 (any) | (1,1,-2) specific |
|
|
|
| Training | Evolutionary (10K gen) | Algebraic construction |
|
|
|
| Neurons | 14 (pruned) | 14 |
|
|
|
| Parameters | 139 (pruned) | 110 |
|
|
|
| Algebraic insight | (-1) β‘ 1 (mod 2) | 1+1-2 = 0 (reset) |
|
|
|
|
|
|
## Limitations
|
|
|
|
|
|
- **Fixed input size**: 8 bits only (algebraic construction extends to any n)
|
|
|
- **Binary inputs**: Expects {0, 1}, not continuous values
|
|
|
- **No noise margin**: Heaviside threshold at exactly 0
|
|
|
- **Not differentiable**: Cannot be fine-tuned with gradient descent
|
|
|
- **Training gap**: Our evolutionary search achieved only 96.5%; we used algebraic construction instead
|
|
|
|
|
|
## Files
|
|
|
|
|
|
```
|
|
|
mod3-verified/
|
|
|
βββ model.safetensors # Network weights (110 params)
|
|
|
βββ model.py # Inference code
|
|
|
βββ config.json # Model metadata
|
|
|
βββ README.md # This file
|
|
|
```
|
|
|
|
|
|
## Citation
|
|
|
|
|
|
```bibtex
|
|
|
@software{tiny_mod3_prover_2025,
|
|
|
title={tiny-mod3-prover: Formally Verified Threshold Network for MOD-3},
|
|
|
author={Norton, Charles},
|
|
|
url={https://huggingface.co/phanerozoic/tiny-mod3-prover},
|
|
|
year={2025},
|
|
|
note={First verified threshold circuit for modular counting beyond parity}
|
|
|
}
|
|
|
```
|
|
|
|
|
|
## Related
|
|
|
|
|
|
- **Proof repository**: [mod3-verified](https://github.com/CharlesCNorton/mod3-verified) β Coq proofs, training attempts, full documentation
|
|
|
- **Parity network**: [tiny-parity-prover](https://huggingface.co/phanerozoic/tiny-parity-prover) β Verified MOD-2 (parity) network
|
|
|
- **Parity proofs**: [threshold-logic-verified](https://github.com/CharlesCNorton/threshold-logic-verified) β Original parity verification project
|
|
|
|
|
|
## License
|
|
|
|
|
|
MIT
|
|
|
|