File size: 6,523 Bytes
7cfe40b
 
 
 
 
 
 
 
 
 
 
 
 
4498680
7cfe40b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ff8da5
7cfe40b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
0ff8da5
7cfe40b
 
 
 
 
 
 
 
 
 
 
 
 
 
4498680
 
7cfe40b
4498680
7cfe40b
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
---

license: mit
tags:
- pytorch
- safetensors
- formal-verification
- coq
- mod3
- modular-arithmetic
- threshold-network
- neuromorphic
---


# tiny-mod3-prover

Formally verified neural network that computes the MOD-3 function (Hamming weight mod 3) on 8-bit inputs. This repository contains the model artifacts; for proof development and Coq source code, see [mod3-verified](https://github.com/CharlesCNorton/mod3-verified).

## Overview

This is a threshold network that computes `mod3(x) = HW(x) mod 3` for 8-bit binary inputs, where HW denotes Hamming weight (number of set bits). The network outputs 0, 1, or 2 corresponding to the three residue classes.

**Key properties:**
- 100% accuracy on all 256 possible inputs
- Correctness proven in Coq via constructive algebraic proof
- Weights constrained to integers (many ternary)
- Heaviside step activation (x β‰₯ 0 β†’ 1, else 0)
- First formally verified threshold circuit for MOD-m where m > 2

## Architecture

| Layer | Neurons | Function |
|-------|---------|----------|
| Input | 8 | Binary input bits |
| Hidden 1 | 9 | Thermometer encoding (HW β‰₯ k) |
| Hidden 2 | 2 | MOD-3 detection |
| Output | 3 | Classification (one-hot) |

**Total: 14 neurons, 110 parameters**

## Quick Start

```python

import torch

from safetensors.torch import load_file



# Load weights

weights = load_file('model.safetensors')



# Manual forward pass (Heaviside activation)

def forward(x, weights):

    x = x.float()

    x = (x @ weights['layer1.weight'].T + weights['layer1.bias'] >= 0).float()

    x = (x @ weights['layer2.weight'].T + weights['layer2.bias'] >= 0).float()

    out = x @ weights['output.weight'].T + weights['output.bias']

    return out.argmax(dim=-1)



# Test

inputs = torch.tensor([[1, 0, 1, 1, 0, 0, 1, 0]], dtype=torch.float32)

output = forward(inputs, weights)

print(f"MOD-3 of [1,0,1,1,0,0,1,0]: {output.item()}")  # 1 (4 bits set, 4 mod 3 = 1)

```

## Weight Structure

| Tensor | Shape | Values | Description |
|--------|-------|--------|-------------|
| `layer1.weight` | [9, 8] | All 1s | Thermometer encoding |
| `layer1.bias` | [9] | [0, -1, ..., -8] | Threshold at HW β‰₯ k |
| `layer2.weight` | [2, 9] | [0,1,1,-2,1,1,-2,1,1] | MOD-3 detection |
| `layer2.bias` | [2] | [-1, -2] | Class thresholds |
| `output.weight` | [3, 2] | Various | Classification |
| `output.bias` | [3] | [0, -1, -1] | Output thresholds |

## Algebraic Insight

For parity (MOD-2), the key insight was that Β±1 dot products preserve Hamming weight parity because (-1) ≑ 1 (mod 2).

For MOD-3, the insight is different. Using weights `(1, 1, -2)` repeated on the thermometer encoding produces partial sums that cycle through `(0, 1, 2, 0, 1, 2, ...)`:

```

HW=0: cumsum = 0  β†’  0 mod 3

HW=1: cumsum = 1  β†’  1 mod 3

HW=2: cumsum = 2  β†’  2 mod 3

HW=3: cumsum = 0  β†’  0 mod 3  (reset: 1+1-2=0)

HW=4: cumsum = 1  β†’  1 mod 3

HW=5: cumsum = 2  β†’  2 mod 3

HW=6: cumsum = 0  β†’  0 mod 3  (reset)

HW=7: cumsum = 1  β†’  1 mod 3

HW=8: cumsum = 2  β†’  2 mod 3

```

The pattern `1 + 1 + (-2) = 0` causes the cumulative sum to reset every 3 steps, tracking HW mod 3.

This generalizes to MOD-m: use weights `(1, 1, ..., 1, 1-m)` with `m-1` ones before the `1-m` term.

## Formal Verification

The network is proven correct in the Coq proof assistant with three independent proofs:

**1. Exhaustive verification:**
```coq

Theorem network_correct_exhaustive : verify_all = true.

Proof. vm_compute. reflexivity. Qed.

```

**2. Constructive verification (case analysis):**
```coq

Theorem network_correct_constructive : forall x0 x1 x2 x3 x4 x5 x6 x7,

  predict [x0; x1; x2; x3; x4; x5; x6; x7] =

  mod3 [x0; x1; x2; x3; x4; x5; x6; x7].

```

**3. Algebraic verification:**
```coq

Theorem cumsum_eq_mod3 : forall k,

  (k <= 8)%nat -> cumsum k = Z.of_nat (Nat.modulo k 3).



Theorem network_algebraic_correct : forall h,

  (h <= 8)%nat ->

  classify (Z.geb (cumsum h - 1) 0) (Z.geb (cumsum h - 2) 0) = Nat.modulo h 3.

```

All proofs are axiom-free ("Closed under the global context").

## MOD-3 Distribution

For 8-bit inputs (256 total):

| Class | Count | Hamming Weights |
|-------|-------|-----------------|
| 0 | 85 | 0, 3, 6 |
| 1 | 86 | 1, 4, 7 |
| 2 | 85 | 2, 5, 8 |

## Training

The parametric construction was derived algebraically, not discovered through training.

Evolutionary search was attempted (as with parity) but consistently plateaued at 247/256 (96.5%) accuracy across multiple seeds. We were unable to discover the (1,1,-2) weight pattern through training, so we used algebraic construction instead.

## Comparison to Parity

| Property | Parity (MOD-2) | MOD-3 |
|----------|----------------|-------|
| Output classes | 2 | 3 |
| Key weights | Β±1 (any) | (1,1,-2) specific |
| Training | Evolutionary (10K gen) | Algebraic construction |
| Neurons | 14 (pruned) | 14 |
| Parameters | 139 (pruned) | 110 |
| Algebraic insight | (-1) ≑ 1 (mod 2) | 1+1-2 = 0 (reset) |

## Limitations

- **Fixed input size**: 8 bits only (algebraic construction extends to any n)
- **Binary inputs**: Expects {0, 1}, not continuous values
- **No noise margin**: Heaviside threshold at exactly 0
- **Not differentiable**: Cannot be fine-tuned with gradient descent
- **Training gap**: Our evolutionary search achieved only 96.5%; we used algebraic construction instead

## Files

```

mod3-verified/

β”œβ”€β”€ model.safetensors    # Network weights (110 params)

β”œβ”€β”€ model.py             # Inference code

β”œβ”€β”€ config.json          # Model metadata

└── README.md            # This file

```

## Citation

```bibtex

@software{tiny_mod3_prover_2025,

  title={tiny-mod3-prover: Formally Verified Threshold Network for MOD-3},

  author={Norton, Charles},

  url={https://huggingface.co/phanerozoic/tiny-mod3-prover},

  year={2025},

  note={First verified threshold circuit for modular counting beyond parity}

}

```

## Related

- **Proof repository**: [mod3-verified](https://github.com/CharlesCNorton/mod3-verified) β€” Coq proofs, training attempts, full documentation
- **Parity network**: [tiny-parity-prover](https://huggingface.co/phanerozoic/tiny-parity-prover) β€” Verified MOD-2 (parity) network
- **Parity proofs**: [threshold-logic-verified](https://github.com/CharlesCNorton/threshold-logic-verified) β€” Original parity verification project

## License

MIT