CharlesCNorton commited on 19 days ago

Commit

ef040f2

1 Parent(s): e5bdb82

Add --ternary flag to quantize.py; rewrite buffer gates as +/-1 weights

quantize.py --ternary rewrites single-input weight=+/-2 identity buffers
(SHL/SHR/ROL/ROR bit gates, stack data buffers, RET address buffers,
flag buffers) as weight=+/-1 with bias adjusted to preserve the
heaviside output for binary inputs. H(2x - 1) and H(x - 1) are equal
on x in {0, 1}, so the rewrite is exact.

After the pass on the canonical 32-bit/64KB build, 174 buffer gates
become ternary; 183 weight tensors remain non-ternary, all positional
comparators (8/16-bit single-layer, byte-level cascade gates,
division-stage comparators) and a handful of hand-constructed modular
arithmetic gates. Fully ternarizing those requires bit-cascading them
in build.py rather than at quantization time.

The metadata field weight_quantization records 'ternary' (no
violations) or 'ternary_partial' (some remain). --strict makes the
quantizer fail when any weight is still non-ternary.

All 18 prebuilt variants and the canonical have been re-quantized with
--ternary. eval_all.py reports 100% fitness across all 18; the CPU
program suite still passes 7/7.

Files changed (21) hide show

README.md +4 -0
neural_computer.safetensors +2 -2
quantize.py +130 -8
variants/neural_alu16.safetensors +2 -2
variants/neural_alu32.safetensors +2 -2
variants/neural_alu8.safetensors +2 -2
variants/neural_computer16.safetensors +2 -2
variants/neural_computer16_reduced.safetensors +2 -2
variants/neural_computer16_registers.safetensors +2 -2
variants/neural_computer16_scratchpad.safetensors +2 -2
variants/neural_computer16_small.safetensors +2 -2
variants/neural_computer32.safetensors +2 -2
variants/neural_computer32_reduced.safetensors +2 -2
variants/neural_computer32_registers.safetensors +2 -2
variants/neural_computer32_scratchpad.safetensors +2 -2
variants/neural_computer32_small.safetensors +2 -2
variants/neural_computer8.safetensors +2 -2
variants/neural_computer8_reduced.safetensors +2 -2
variants/neural_computer8_registers.safetensors +2 -2
variants/neural_computer8_scratchpad.safetensors +2 -2
variants/neural_computer8_small.safetensors +2 -2

README.md CHANGED Viewed

@@ -249,10 +249,14 @@ The quantizer is also available standalone:
 python quantize.py path/to/file.safetensors           # in-place
 python quantize.py variants/                          # whole directory
 python quantize.py model.safetensors -o quantized.safetensors
 ```
 Most tensors fit in `int8`; comparator weights and a few wide single-layer threshold gates use `int16` or `int32`. The eval pipeline promotes weights to `float32` on load, so integer storage is exact and transparent.
 ---
 ## Verification

 python quantize.py path/to/file.safetensors           # in-place
 python quantize.py variants/                          # whole directory
 python quantize.py model.safetensors -o quantized.safetensors
+python quantize.py file.safetensors --ternary         # push toward {-1, 0, 1} weights
+python quantize.py file.safetensors --ternary --strict  # error if any weight is non-ternary
 ```
 Most tensors fit in `int8`; comparator weights and a few wide single-layer threshold gates use `int16` or `int32`. The eval pipeline promotes weights to `float32` on load, so integer storage is exact and transparent.
+**Ternary mode.** With `--ternary`, the quantizer also rewrites single-input `weight=±2` identity buffers (SHL/SHR/ROL/ROR bit gates, stack data buffers, RET address buffers, flag buffers) as `weight=±1` with bias adjusted to preserve the heaviside output for binary inputs (`H(2x - 1) ≡ H(x - 1)` etc.). After this pass the canonical model has 174 buffer gates rewritten and 183 weight tensors remaining non-ternary, all of which are positional comparators (8/16-bit single-layer, byte-level cascade gates, division-stage comparators) and a handful of hand-constructed modular arithmetic gates. Fully ternarizing those requires bit-cascading them in `build.py`, which is a structural change rather than a quantization pass. The metadata field `weight_quantization` records `ternary` (clean) or `ternary_partial` (some violations remain).
 ---
 ## Verification

neural_computer.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c87e96ed650bf30fd25350714a75b9bdc84651077774b2900b69b9cf647a0748
-size 21777922

 version https://git-lfs.github.com/spec/v1
+oid sha256:e9c7424d5643c17ac22bee3930d85976e68b395ae45b146f2bb61318aff38c9f
+size 21777962

quantize.py CHANGED Viewed

@@ -12,11 +12,22 @@ This is a packaging optimization, not a precision change: the eval
 pipeline already promotes weights to float32 on load, so integer
 storage is exact.
 Usage:
     python quantize.py path/to/file.safetensors                      # in-place
     python quantize.py path/to/file.safetensors -o out.safetensors   # to new file
     python quantize.py variants/                                      # whole directory in place
     python quantize.py variants/ -o variants_int/                     # whole directory to new dir
 """
 from __future__ import annotations
@@ -68,10 +79,74 @@ def _min_signed_int_dtype(tensor: torch.Tensor) -> torch.dtype:
     return torch.int64
 def quantize_tensors(
-    tensors: Dict[str, torch.Tensor]
-) -> Tuple[Dict[str, torch.Tensor], Dict[str, int], Tuple[int, int]]:
-    """Quantize a dict of tensors. Returns (new_tensors, dtype_counts, (bytes_before, bytes_after))."""
     new_tensors: Dict[str, torch.Tensor] = {}
     counts: Dict[str, int] = {"int8": 0, "int16": 0, "int32": 0, "int64": 0,
                               "manifest_kept": 0, "skipped": 0}
@@ -101,10 +176,11 @@ def quantize_tensors(
         bytes_after += cast.numel() * cast.element_size()
         counts[str(target).replace("torch.", "")] += 1
-    return new_tensors, counts, (bytes_before, bytes_after)
-def quantize_file(in_path: Path, out_path: Path, verbose: bool = False) -> Dict:
     file_before = in_path.stat().st_size
     tensors: Dict[str, torch.Tensor] = {}
     metadata: Dict[str, str] = {}
@@ -116,10 +192,36 @@ def quantize_file(in_path: Path, out_path: Path, verbose: bool = False) -> Dict:
             # clone so the source mmap can be released before we write
             tensors[name] = f.get_tensor(name).clone()
-    new_tensors, counts, (before, after) = quantize_tensors(tensors)
     # Drop the original mmap-backed tensors before writing in-place.
     del tensors
     out_path.parent.mkdir(parents=True, exist_ok=True)
     save_file(new_tensors, str(out_path), metadata=metadata or None)
     file_after = out_path.stat().st_size
@@ -131,6 +233,8 @@ def quantize_file(in_path: Path, out_path: Path, verbose: bool = False) -> Dict:
         "tensor_bytes_after": after,
         "file_size_before": file_before,
         "file_size_after": file_after,
     }
@@ -149,6 +253,11 @@ def _print_summary(label: str, info: Dict) -> None:
         f"({ratio_t:.2f}x)"
     )
     print(f"      {bucket_str}")
 def main() -> int:
@@ -157,6 +266,13 @@ def main() -> int:
     parser.add_argument("-o", "--output", type=Path, default=None,
                         help="output file or directory (default: in-place)")
     parser.add_argument("-v", "--verbose", action="store_true")
     args = parser.parse_args()
     inputs = []
@@ -185,10 +301,16 @@ def main() -> int:
     total_before = 0
     total_after = 0
-    print(f"Quantizing {len(inputs)} file(s)\n")
     for src, dst in zip(inputs, outputs):
-        info = quantize_file(src, dst, verbose=args.verbose)
         _print_summary(src.name, info)
         total_before += info["file_size_before"]
         total_after += info["file_size_after"]

 pipeline already promotes weights to float32 on load, so integer
 storage is exact.
+The --ternary flag also rewrites single-input weight=+/-2 identity
+buffers (SHL/SHR/ROL/ROR bit gates, stack data buffers, RET address
+buffers, flag buffers) to weight=+/-1 with bias adjusted as needed to
+preserve heaviside output for binary inputs. After this pass every
+weight tensor in the file lies in {-1, 0, 1} except for positional
+comparators and a few hand-constructed modular arithmetic circuits
+(see the violation report); fully ternarizing those requires bit-
+cascading in build.py.
 Usage:
     python quantize.py path/to/file.safetensors                      # in-place
     python quantize.py path/to/file.safetensors -o out.safetensors   # to new file
     python quantize.py variants/                                      # whole directory in place
     python quantize.py variants/ -o variants_int/                     # whole directory to new dir
+    python quantize.py file.safetensors --ternary                     # try ternary weights
+    python quantize.py file.safetensors --ternary --strict            # error if any weight non-ternary
 """
 from __future__ import annotations
     return torch.int64
+def _ternarize_buffers(
+    tensors: Dict[str, torch.Tensor],
+) -> Tuple[Dict[str, torch.Tensor], Dict]:
+    """Rewrite single-input weight=+-2 identity buffers as weight=+-1 with
+    bias adjusted to preserve heaviside output for binary inputs.
+    For a single-input gate H(w*x + b) with x in {0, 1}, the only thing
+    that matters is the pair (H(b), H(w + b)). Pick the smallest integer
+    bias b' such that (H(b'), H(sgn + b')) matches, with sgn = sign(w).
+    Returns (new_tensors, stats). stats has 'fixed', 'failed', 'failed_names'.
+    """
+    new_tensors = dict(tensors)
+    fixed = 0
+    failed_names = []
+    weight_keys = [k for k in tensors if k.endswith(".weight")]
+    for wkey in weight_keys:
+        w = tensors[wkey]
+        wf = w.float() if w.dtype.is_floating_point else w.to(torch.float64).float()
+        if (wf.abs() <= 1.0).all():
+            continue  # already ternary
+        gate = wkey[: -len(".weight")]
+        bkey = gate + ".bias"
+        # Single-input weight=+-2 buffer with single bias
+        if (
+            wf.numel() == 1
+            and abs(wf.item()) == 2.0
+            and bkey in tensors
+            and tensors[bkey].numel() == 1
+        ):
+            w_val = wf.item()
+            b_val = float(tensors[bkey].float().item())
+            sgn = 1.0 if w_val > 0 else -1.0
+            x0_target = 1 if b_val >= 0 else 0
+            x1_target = 1 if (w_val + b_val) >= 0 else 0
+            chosen = None
+            # Prefer keeping the bias unchanged when possible
+            for b_new in [int(b_val), int(b_val) - 1, -1, 0, -2, 1, -3, 2]:
+                x0 = 1 if b_new >= 0 else 0
+                x1 = 1 if (sgn + b_new) >= 0 else 0
+                if x0 == x0_target and x1 == x1_target:
+                    chosen = b_new
+                    break
+            if chosen is not None:
+                new_tensors[wkey] = torch.tensor([sgn], dtype=torch.float64)
+                new_tensors[bkey] = torch.tensor([float(chosen)], dtype=torch.float64)
+                fixed += 1
+                continue
+        failed_names.append(wkey)
+    return new_tensors, {"fixed": fixed, "failed_names": failed_names}
 def quantize_tensors(
+    tensors: Dict[str, torch.Tensor],
+    ternary: bool = False,
+) -> Tuple[Dict[str, torch.Tensor], Dict[str, int], Tuple[int, int], Dict]:
+    """Quantize a dict of tensors. Returns
+    (new_tensors, dtype_counts, (bytes_before, bytes_after), ternary_stats)."""
+    ternary_stats: Dict = {"applied": False, "fixed": 0, "failed_names": []}
+    if ternary:
+        tensors, ternary_stats = _ternarize_buffers(tensors)
+        ternary_stats["applied"] = True
     new_tensors: Dict[str, torch.Tensor] = {}
     counts: Dict[str, int] = {"int8": 0, "int16": 0, "int32": 0, "int64": 0,
                               "manifest_kept": 0, "skipped": 0}
         bytes_after += cast.numel() * cast.element_size()
         counts[str(target).replace("torch.", "")] += 1
+    return new_tensors, counts, (bytes_before, bytes_after), ternary_stats
+def quantize_file(in_path: Path, out_path: Path, verbose: bool = False,
+                  ternary: bool = False, strict_ternary: bool = False) -> Dict:
     file_before = in_path.stat().st_size
     tensors: Dict[str, torch.Tensor] = {}
     metadata: Dict[str, str] = {}
             # clone so the source mmap can be released before we write
             tensors[name] = f.get_tensor(name).clone()
+    new_tensors, counts, (before, after), tstats = quantize_tensors(tensors, ternary=ternary)
+    # Audit final ternary status (count of weight tensors with |w| > 1)
+    final_nonternary = []
+    for k, v in new_tensors.items():
+        if not k.endswith(".weight"):
+            continue
+        if k.startswith("manifest."):
+            continue
+        vf = v.float() if v.dtype.is_floating_point else v.to(torch.float64).float()
+        if (vf.abs() > 1.0).any():
+            final_nonternary.append(k)
+    if ternary and strict_ternary and final_nonternary:
+        raise ValueError(
+            f"--strict failed: {len(final_nonternary)} weight tensors are not "
+            f"ternary after transformation; first: {final_nonternary[:5]}"
+        )
     # Drop the original mmap-backed tensors before writing in-place.
     del tensors
     out_path.parent.mkdir(parents=True, exist_ok=True)
+    if ternary:
+        # Note ternary mode in metadata so downstream tools can see it
+        if metadata is None:
+            metadata = {}
+        metadata = dict(metadata)
+        metadata["weight_quantization"] = (
+            "ternary_partial" if final_nonternary else "ternary"
+        )
     save_file(new_tensors, str(out_path), metadata=metadata or None)
     file_after = out_path.stat().st_size
         "tensor_bytes_after": after,
         "file_size_before": file_before,
         "file_size_after": file_after,
+        "ternary": tstats,
+        "final_nonternary": final_nonternary,
     }
         f"({ratio_t:.2f}x)"
     )
     print(f"      {bucket_str}")
+    if info.get("ternary", {}).get("applied"):
+        ts = info["ternary"]
+        nt = info["final_nonternary"]
+        print(f"      ternary: {ts['fixed']} buffer gates rewritten; "
+              f"{len(nt)} weight tensors remain non-ternary")
 def main() -> int:
     parser.add_argument("-o", "--output", type=Path, default=None,
                         help="output file or directory (default: in-place)")
     parser.add_argument("-v", "--verbose", action="store_true")
+    parser.add_argument("--ternary", action="store_true",
+                        help="Rewrite single-input weight=+/-2 buffers as +/-1 to push toward "
+                             "ternary {-1, 0, 1} weights and report remaining violations")
+    parser.add_argument("--strict", action="store_true",
+                        help="With --ternary, fail if any weight tensor is still non-ternary")
+    parser.add_argument("--report-violations", type=int, default=0, metavar="N",
+                        help="Print first N non-ternary weight tensor names per file")
     args = parser.parse_args()
     inputs = []
     total_before = 0
     total_after = 0
+    print(f"Quantizing {len(inputs)} file(s)" + (" (ternary mode)" if args.ternary else "") + "\n")
     for src, dst in zip(inputs, outputs):
+        info = quantize_file(src, dst, verbose=args.verbose,
+                             ternary=args.ternary, strict_ternary=args.strict)
         _print_summary(src.name, info)
+        if args.report_violations and info.get("final_nonternary"):
+            for name in info["final_nonternary"][: args.report_violations]:
+                print(f"        non-ternary: {name}")
+            if len(info["final_nonternary"]) > args.report_violations:
+                print(f"        ... and {len(info['final_nonternary']) - args.report_violations} more")
         total_before += info["file_size_before"]
         total_after += info["file_size_after"]

variants/neural_alu16.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:424b192a426ea3ccb96d753dcebc032814e01e3d0b15ae8633d832608bdcf7ef
-size 11473981

 version https://git-lfs.github.com/spec/v1
+oid sha256:493f6f679b78e0d3d15a187dcd9a733b9bd8f51b8c5f4065ff68d3ea2aa351f6
+size 11474021

variants/neural_alu32.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:aa26c2943200c5ed07358bf0cff09da1ecf9f3058786681790458b829a95e663
-size 13258620

 version https://git-lfs.github.com/spec/v1
+oid sha256:a5907e430443b0deb48aa666b15da6ac6e57006367868282aac6dcbe19d28bde
+size 13258660

variants/neural_alu8.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:706d0c80f76613b0a9c72f5e3f8bc526bf5d638e9f0e73dd046b769db0d37cf7
-size 10688461

 version https://git-lfs.github.com/spec/v1
+oid sha256:136684d2ebdd3f54c73dccc2794e7e09bc265c670a0660ba03093a1386478582
+size 10688501

variants/neural_computer16.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9debd616a7493699a87fe6640ac9a9cc15c4658eae2b167fb8d6e10c61668b27
-size 19974859

 version https://git-lfs.github.com/spec/v1
+oid sha256:9d9ccd2154b44ece7b39ed37c03149ab38f7d840c24d3296c8b427e1217ae2f3
+size 19974899

variants/neural_computer16_reduced.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4e52801828ef7434aa24fe039aa8ef0b4d739efe93fc32235d59a0e8b8fc0d58
-size 12163595

 version https://git-lfs.github.com/spec/v1
+oid sha256:5dc5fcb06f90d70173d26556f7ab8105f55c0f8be479f742a1c4d12668cc8116
+size 12163635

variants/neural_computer16_registers.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:b4f76c0b906a0beeb27b4ecae262690c53a7ad61c48bd5bb18cd3a531435ae73
-size 11560755

 version https://git-lfs.github.com/spec/v1
+oid sha256:254c05ec2b2e9aac83eb1944d711db20202f26baa5291a5e9bd020e1ed3c713c
+size 11560795

variants/neural_computer16_scratchpad.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:12ef91ad2cb1490a96563ff3cc103b0bf92641bc19d987bfd66539a26c6a75c1
-size 11641459

 version https://git-lfs.github.com/spec/v1
+oid sha256:370b35f3b1c4a290bfbb64fdfce824259348def722decfaf7f22816f0d3fcc68
+size 11641499

variants/neural_computer16_small.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7d6f48f1c10f67c9292be41ea575e4ce1302d0e0fd16c8e9071dfb19d204ea1c
-size 11760299

 version https://git-lfs.github.com/spec/v1
+oid sha256:bc54a3aa13383f738e6f2109b96df9cfa7e6bd6869d5c5aa08422c3af523e383
+size 11760339

variants/neural_computer32.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c87e96ed650bf30fd25350714a75b9bdc84651077774b2900b69b9cf647a0748
-size 21777922

 version https://git-lfs.github.com/spec/v1
+oid sha256:313594c493c124733f70b81a73813eb7e242143df2dc3c9b800fa7f1de57dc3b
+size 21777962

variants/neural_computer32_reduced.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5569a85aa93c0f7e2ffd67e9ccb5cfb5cdea9b8b881780e1c2d8566f1aef6455
-size 13966650

 version https://git-lfs.github.com/spec/v1
+oid sha256:836ea61f879ead37a5274edb0346ad50af92b99451555a4cc94c11bf65c237bf
+size 13966690

variants/neural_computer32_registers.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9d5adaf93705e72be7ceb91fe241a7a4c5d0e456b93322022a8e5697c2838828
-size 13363818

 version https://git-lfs.github.com/spec/v1
+oid sha256:e6443f13914c22775003fda952bad6bec7638efadde620e4e094c66d2268e72f
+size 13363858

variants/neural_computer32_scratchpad.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5b6713adeab730ba5f605507a6e65320bc71d221146d57da4af8faa29c807d5e
-size 13444514

 version https://git-lfs.github.com/spec/v1
+oid sha256:f24b2425ee50bf65746714ece769b0ddca32f13c29641a2350c9686a4c187289
+size 13444554

variants/neural_computer32_small.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:57d3c29a466a38c9f03e60d223b5f1c4543dd6a2c0b9607183952bfd0a283b21
-size 13563370

 version https://git-lfs.github.com/spec/v1
+oid sha256:6d037affb4b59c0c2f0c8814fe2fc75b78e2508068b520c01714e0e4f82447ca
+size 13563410

variants/neural_computer8.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a225796853866f88d04d1568a81324526a53b6eda6c373cef101f99a3cae162f
-size 19180163

 version https://git-lfs.github.com/spec/v1
+oid sha256:4c25ab253d41866fd627a63bb1d6350c5869f4f6f86dd04e3773fab63595d277
+size 19180203

variants/neural_computer8_reduced.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:10b2a79f5347593bc1c6d8a850308b0e6f37e316427ea546ff77aa1ccc6a8118
-size 11368899

 version https://git-lfs.github.com/spec/v1
+oid sha256:ea0532e900fad00d62c3c01f9ee8a820f0504e71857c18c4b743499f02b3b1b3
+size 11368939

variants/neural_computer8_registers.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:4715734c39b883d6af2ddbcbd848df11e39e4e3fab9913b91afaf8709b8ac88c
-size 10766059

 version https://git-lfs.github.com/spec/v1
+oid sha256:e0f47640d477e167094e90647fdbcb2b9d5b63337f3394b3b5cd0e94229496c9
+size 10766099

variants/neural_computer8_scratchpad.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8d32afb2aed1fa557d1579b9be351fbfd4d7eab8370a26d5181359dfbd75ab64
-size 10846763

 version https://git-lfs.github.com/spec/v1
+oid sha256:fd4d75e83a351b2609c3bc65cd90668228f490ea83c9fb93d86c9b6fbde5ab72
+size 10846803

variants/neural_computer8_small.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:5374476d6c8fed45da1249488865b3ec03ae76289272a1eb28c30d6855001f71
-size 10965603

 version https://git-lfs.github.com/spec/v1
+oid sha256:eaba41825a68678021fafeced6c21b49b602117ae091ea91744190c2301e8088
+size 10965643