XllentAI
/

modular_arithmetic

@@ -17,8 +17,8 @@ multiplication tables. Entry for the
 [Modular Arithmetic Challenge](https://github.com/SAIRcompetition/modular-arithmetic-challenge).
 - **Saturates tiers 1–4** (all primes `< 2³²`): tiers 1–3 = 100%, tier 4 = 99%
-- **Tier 5** (33–64-bit primes) = 0.64 on the public benchmark
-- **overall_accuracy 0.473**, `highest_tier_above_90 = 4`
 - Verifiably **generalises to primes never seen in training** (held-out-prime validation
   accuracy tracks training accuracy — no memorisation gap)
@@ -52,7 +52,7 @@ holds the prime:
 |---|---|---|---|---|---|---|
 | `weights16.pt` | 16-bit | `< 2¹⁶` | 1–3 | 4096 / 4 | ~50M | tiers 1–3 = 1.00 |
 | `weights32.pt` | 32-bit | `< 2³²` | 4 | 6144 / 4 | ~114M | tier 4 = 0.99 |
-| `weights64.pt` | 64-bit | `< 2⁶⁴` | 5 | 4096 / 7, residual | ~236M | tier 5 = 0.64 |
 The 64-bit cell needs **depth and residual connections** the narrower cells do not: a 64-bit
 modular Horner step hides two long carry chains (the `2t + bit·b` addition and the
@@ -106,7 +106,7 @@ cell is *at* the floor. The capability therefore resides in the trained paramete
 |---|---|---|---|---|---|---|
 | tier 3 (16-bit cell) | 1.00 | 1.00 | 0.98 | 0.74 | 0.06 | 0.00 |
 | tier 4 (32-bit cell) | 0.99 | 0.99 | 0.86 | 0.04 | 0.02 | 0.00 |
-| tier 5 (64-bit cell) | 0.64 | 0.57 | 0.41 | 0.01 | 0.01 | 0.00 |
 Generalisation against memorisation: 10% of primes at each bit-width were held out of
 training entirely; chain accuracy on them matches the training primes.

 [Modular Arithmetic Challenge](https://github.com/SAIRcompetition/modular-arithmetic-challenge).
 - **Saturates tiers 1–4** (all primes `< 2³²`): tiers 1–3 = 100%, tier 4 = 99%
+- **Tier 5** (33–64-bit primes) = 0.74 on the public benchmark
+- **overall_accuracy 0.483**, `highest_tier_above_90 = 4`
 - Verifiably **generalises to primes never seen in training** (held-out-prime validation
   accuracy tracks training accuracy — no memorisation gap)
 |---|---|---|---|---|---|---|
 | `weights16.pt` | 16-bit | `< 2¹⁶` | 1–3 | 4096 / 4 | ~50M | tiers 1–3 = 1.00 |
 | `weights32.pt` | 32-bit | `< 2³²` | 4 | 6144 / 4 | ~114M | tier 4 = 0.99 |
+| `weights64.pt` | 64-bit | `< 2⁶⁴` | 5 | 4096 / 7, residual | ~236M | tier 5 = 0.74 |
 The 64-bit cell needs **depth and residual connections** the narrower cells do not: a 64-bit
 modular Horner step hides two long carry chains (the `2t + bit·b` addition and the
 |---|---|---|---|---|---|---|
 | tier 3 (16-bit cell) | 1.00 | 1.00 | 0.98 | 0.74 | 0.06 | 0.00 |
 | tier 4 (32-bit cell) | 0.99 | 0.99 | 0.86 | 0.04 | 0.02 | 0.00 |
+| tier 5 (64-bit cell) | 0.74 | 0.71 | 0.46 | 0.01 | 0.01 | 0.00 |
 Generalisation against memorisation: 10% of primes at each bit-width were held out of
 training entirely; chain accuracy on them matches the training primes.