CharlesCNorton commited on 8 days ago

Commit

e5bdb82

1 Parent(s): 6e3b69a

Quantize weights to per-tensor minimum signed integer dtype

quantize.py loads a safetensors file (or a directory of them), normalizes
half-integer biases (e.g. arithmetic.asr8bit.bit*.bias = -0.5 floors to -1
without changing H semantics for binary inputs), and casts each tensor to
the smallest signed integer dtype that exactly represents its values.

Distribution of dtypes in a typical full build: ~71% of tensors fit int8
(weights of magnitude <= 1 or 2, biases up to about -16), ~29% need int16
(byte-level comparator weights of +-128 and the 16-bit single-layer
comparator), and a handful of wide single-layer comparators in the float
and 32-bit-divisor circuits stay at int32 or int64.

Tensor data shrinks 4x (34 MB -> 8.6 MB on the canonical 64KB build).
Total file size drops more modestly because the safetensors header is a
fixed-cost JSON entry per tensor; the smaller variants are header-bound.
Across the 18 prebuilt variants total disk drops from 340 MB to 242 MB.

The eval pipeline already calls .float() on load, so integer storage is
exact and transparent. All 18 quantized variants pass eval_all.py at
100% fitness; the CPU program suite still passes 7/7.

build_all.py runs quantize as a standard post-build step.

Files changed (22) hide show

README.md +13 -2
build_all.py +24 -1
neural_computer.safetensors +2 -2
quantize.py +205 -0
variants/neural_alu16.safetensors +2 -2
variants/neural_alu32.safetensors +2 -2
variants/neural_alu8.safetensors +2 -2
variants/neural_computer16.safetensors +2 -2
variants/neural_computer16_reduced.safetensors +2 -2
variants/neural_computer16_registers.safetensors +2 -2
variants/neural_computer16_scratchpad.safetensors +2 -2
variants/neural_computer16_small.safetensors +2 -2
variants/neural_computer32.safetensors +2 -2
variants/neural_computer32_reduced.safetensors +2 -2
variants/neural_computer32_registers.safetensors +2 -2
variants/neural_computer32_scratchpad.safetensors +2 -2
variants/neural_computer32_small.safetensors +2 -2
variants/neural_computer8.safetensors +2 -2
variants/neural_computer8_reduced.safetensors +2 -2
variants/neural_computer8_registers.safetensors +2 -2
variants/neural_computer8_scratchpad.safetensors +2 -2
variants/neural_computer8_small.safetensors +2 -2

README.md CHANGED Viewed

@@ -241,7 +241,17 @@ To regenerate every named variant in one pass:
 python build_all.py
 ```
-This populates `variants/` with all 18 builds and runs `eval.py` on each as a sanity check.
 ---
@@ -437,7 +447,8 @@ Loss components: BCE on output bits, BCE on extracted A and B bits (2× weight),
 neural_computer.safetensors         canonical model (32-bit, 64 KB, ~8.47M params)
 variants/                           18 prebuilt configurations
 build.py                            generator (one safetensors per invocation)
-build_all.py                        builds and verifies every named profile
 eval.py                             gate-level fitness suite + reference CPU runtime
 eval_all.py                         variant-agnostic gate-level harness
 cpu_programs.py                     assembler + program suite for CPU-level validation

 python build_all.py
 ```
+This populates `variants/` with all 18 builds, quantizes each one to the smallest signed integer dtype that exactly represents its weights (~4× reduction in tensor data, with file size dominated by the safetensors header on the smaller profiles), and runs `eval.py` on each as a sanity check.
+The quantizer is also available standalone:
+```bash
+python quantize.py path/to/file.safetensors           # in-place
+python quantize.py variants/                          # whole directory
+python quantize.py model.safetensors -o quantized.safetensors
+```
+Most tensors fit in `int8`; comparator weights and a few wide single-layer threshold gates use `int16` or `int32`. The eval pipeline promotes weights to `float32` on load, so integer storage is exact and transparent.
 ---
 neural_computer.safetensors         canonical model (32-bit, 64 KB, ~8.47M params)
 variants/                           18 prebuilt configurations
 build.py                            generator (one safetensors per invocation)
+build_all.py                        builds, quantizes, and verifies every named profile
+quantize.py                         casts each tensor to its minimum signed integer dtype
 eval.py                             gate-level fitness suite + reference CPU runtime
 eval_all.py                         variant-agnostic gate-level harness
 cpu_programs.py                     assembler + program suite for CPU-level validation

build_all.py CHANGED Viewed

@@ -63,6 +63,24 @@ def build_variant(bits: int, profile: str) -> Path:
     return out
 def measure_variant(path: Path) -> dict:
     """Read tensor count, params, manifest values from the variant."""
     with safe_open(str(path), framework="pt") as f:
@@ -130,6 +148,9 @@ def main() -> None:
             try:
                 path = build_variant(bits, profile)
                 bt = time.time() - t0
                 meta = measure_variant(path)
                 ev = eval_variant(path, device="cpu", timeout=900)
                 rows.append({
@@ -140,7 +161,9 @@ def main() -> None:
                     **{k: ev[k] for k in ("fitness", "total_tests", "status", "elapsed_s")},
                     "log_tail": ev["log_tail"] if ev["status"] != "PASS" else "",
                 })
-                print(f"  built in {bt:.1f}s  size={meta['size_mb']:.1f}MB"
                       f"  params={meta['params']:,}  tensors={meta['tensors']:,}")
                 print(f"  eval: fitness={ev['fitness']}  tests={ev['total_tests']}"
                       f"  status={ev['status']}  ({ev['elapsed_s']:.1f}s)")

     return out
+def quantize_variant(path: Path) -> tuple[int, int]:
+    """Run quantize.py on a built variant. Returns (bytes_before, bytes_after)."""
+    rc, log = run([sys.executable, str(ROOT / "quantize.py"), str(path)], timeout=300)
+    if rc != 0:
+        raise RuntimeError(f"quantize failed for {path.name}:\n{log[-800:]}")
+    # parse the "file X.X MB -> Y.Y MB" line
+    for line in log.splitlines():
+        if "file" in line and "->" in line and path.name in line:
+            try:
+                parts = line.split("file")[1].split("->")
+                before = float(parts[0].strip().split()[0]) * 1e6
+                after = float(parts[1].strip().split()[0]) * 1e6
+                return int(before), int(after)
+            except Exception:
+                pass
+    return 0, 0
 def measure_variant(path: Path) -> dict:
     """Read tensor count, params, manifest values from the variant."""
     with safe_open(str(path), framework="pt") as f:
             try:
                 path = build_variant(bits, profile)
                 bt = time.time() - t0
+                pre_q_meta = measure_variant(path)
+                # Quantize in-place; weights are integer-valued so this is exact.
+                qb, qa = quantize_variant(path)
                 meta = measure_variant(path)
                 ev = eval_variant(path, device="cpu", timeout=900)
                 rows.append({
                     **{k: ev[k] for k in ("fitness", "total_tests", "status", "elapsed_s")},
                     "log_tail": ev["log_tail"] if ev["status"] != "PASS" else "",
                 })
+                q_ratio = qb / qa if qa else 1.0
+                print(f"  built in {bt:.1f}s  size={pre_q_meta['size_mb']:.1f}MB -> "
+                      f"{meta['size_mb']:.1f}MB after quant ({q_ratio:.2f}x)"
                       f"  params={meta['params']:,}  tensors={meta['tensors']:,}")
                 print(f"  eval: fitness={ev['fitness']}  tests={ev['total_tests']}"
                       f"  status={ev['status']}  ({ev['elapsed_s']:.1f}s)")

neural_computer.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:18f4f3420fb307d90ea7a8fe356c196a59d7a0f2ed4ec57679d87b209a7fec22
-size 47693920

 version https://git-lfs.github.com/spec/v1
+oid sha256:c87e96ed650bf30fd25350714a75b9bdc84651077774b2900b69b9cf647a0748
+size 21777922

quantize.py ADDED Viewed

	@@ -0,0 +1,205 @@

+"""
+Quantize threshold-computer safetensors to the minimum signed integer
+dtype that exactly represents each tensor.
+Weights and biases in this library are integer-valued by construction,
+with one historical exception: a handful of legacy buffer gates use a
+bias of -0.5 (e.g. arithmetic.asr8bit.bit*.bias). For binary inputs,
+H(x - 0.5) and H(x - 1) are identical, so those biases are floored to
+-1 before casting.
+This is a packaging optimization, not a precision change: the eval
+pipeline already promotes weights to float32 on load, so integer
+storage is exact.
+Usage:
+    python quantize.py path/to/file.safetensors                      # in-place
+    python quantize.py path/to/file.safetensors -o out.safetensors   # to new file
+    python quantize.py variants/                                      # whole directory in place
+    python quantize.py variants/ -o variants_int/                     # whole directory to new dir
+"""
+from __future__ import annotations
+import argparse
+import sys
+from pathlib import Path
+from typing import Dict, Tuple
+import torch
+from safetensors import safe_open
+from safetensors.torch import save_file
+DTYPES = [
+    (torch.int8, -(1 << 7), (1 << 7) - 1),
+    (torch.int16, -(1 << 15), (1 << 15) - 1),
+    (torch.int32, -(1 << 31), (1 << 31) - 1),
+    (torch.int64, None, None),  # always fits
+]
+def _normalize_to_int(tensor: torch.Tensor) -> torch.Tensor:
+    """Return a tensor with strictly integer values, floored from any
+    half-integer values. Floor (not round) because a -0.5 bias must
+    become -1 (not 0) to preserve H(x + bias) for binary x."""
+    if not tensor.dtype.is_floating_point:
+        return tensor.to(torch.float64)  # promote for range checks
+    tf = tensor.to(torch.float64)
+    rounded = tf.round()
+    if torch.equal(rounded, tf):
+        return tf
+    doubled = tf * 2.0
+    if torch.equal(doubled.round(), doubled):
+        return torch.floor(tf)
+    raise ValueError(
+        f"tensor has non-half-integer values; range "
+        f"[{tf.min().item()}, {tf.max().item()}]"
+    )
+def _min_signed_int_dtype(tensor: torch.Tensor) -> torch.dtype:
+    if tensor.numel() == 0:
+        return torch.int8
+    lo = int(tensor.min().item())
+    hi = int(tensor.max().item())
+    for dtype, lo_lim, hi_lim in DTYPES:
+        if lo_lim is None or (lo_lim <= lo and hi <= hi_lim):
+            return dtype
+    return torch.int64
+def quantize_tensors(
+    tensors: Dict[str, torch.Tensor]
+) -> Tuple[Dict[str, torch.Tensor], Dict[str, int], Tuple[int, int]]:
+    """Quantize a dict of tensors. Returns (new_tensors, dtype_counts, (bytes_before, bytes_after))."""
+    new_tensors: Dict[str, torch.Tensor] = {}
+    counts: Dict[str, int] = {"int8": 0, "int16": 0, "int32": 0, "int64": 0,
+                              "manifest_kept": 0, "skipped": 0}
+    bytes_before = 0
+    bytes_after = 0
+    for name, t in tensors.items():
+        bytes_before += t.numel() * t.element_size()
+        if name.startswith("manifest."):
+            new_tensors[name] = t
+            counts["manifest_kept"] += 1
+            bytes_after += t.numel() * t.element_size()
+            continue
+        try:
+            normalized = _normalize_to_int(t)
+        except ValueError:
+            new_tensors[name] = t
+            counts["skipped"] += 1
+            bytes_after += t.numel() * t.element_size()
+            continue
+        target = _min_signed_int_dtype(normalized)
+        cast = normalized.to(target)
+        new_tensors[name] = cast
+        bytes_after += cast.numel() * cast.element_size()
+        counts[str(target).replace("torch.", "")] += 1
+    return new_tensors, counts, (bytes_before, bytes_after)
+def quantize_file(in_path: Path, out_path: Path, verbose: bool = False) -> Dict:
+    file_before = in_path.stat().st_size
+    tensors: Dict[str, torch.Tensor] = {}
+    metadata: Dict[str, str] = {}
+    with safe_open(str(in_path), framework="pt") as f:
+        meta = f.metadata()
+        if meta:
+            metadata = dict(meta)
+        for name in f.keys():
+            # clone so the source mmap can be released before we write
+            tensors[name] = f.get_tensor(name).clone()
+    new_tensors, counts, (before, after) = quantize_tensors(tensors)
+    # Drop the original mmap-backed tensors before writing in-place.
+    del tensors
+    out_path.parent.mkdir(parents=True, exist_ok=True)
+    save_file(new_tensors, str(out_path), metadata=metadata or None)
+    file_after = out_path.stat().st_size
+    return {
+        "in_path": str(in_path),
+        "out_path": str(out_path),
+        "tensor_counts": counts,
+        "tensor_bytes_before": before,
+        "tensor_bytes_after": after,
+        "file_size_before": file_before,
+        "file_size_after": file_after,
+    }
+def _print_summary(label: str, info: Dict) -> None:
+    cb = info["tensor_bytes_before"]
+    ca = info["tensor_bytes_after"]
+    fb = info["file_size_before"]
+    fa = info["file_size_after"]
+    counts = info["tensor_counts"]
+    bucket_str = "  ".join(f"{k}={v}" for k, v in counts.items() if v)
+    ratio_t = cb / ca if ca else 1.0
+    ratio_f = fb / fa if fa else 1.0
+    print(
+        f"  {label}: file {fb / 1e6:6.1f} MB -> {fa / 1e6:6.1f} MB "
+        f"({ratio_f:.2f}x);  tensor data {cb / 1e6:6.1f} MB -> {ca / 1e6:6.1f} MB "
+        f"({ratio_t:.2f}x)"
+    )
+    print(f"      {bucket_str}")
+def main() -> int:
+    parser = argparse.ArgumentParser(description="Quantize safetensors to min signed int dtype")
+    parser.add_argument("input", type=Path, help=".safetensors file or directory of files")
+    parser.add_argument("-o", "--output", type=Path, default=None,
+                        help="output file or directory (default: in-place)")
+    parser.add_argument("-v", "--verbose", action="store_true")
+    args = parser.parse_args()
+    inputs = []
+    if args.input.is_dir():
+        inputs = sorted(p for p in args.input.glob("*.safetensors"))
+    elif args.input.is_file():
+        inputs = [args.input]
+    else:
+        print(f"not found: {args.input}", file=sys.stderr)
+        return 2
+    if not inputs:
+        print(f"no .safetensors files under {args.input}", file=sys.stderr)
+        return 2
+    if args.output is None:
+        outputs = inputs  # in-place
+    elif args.output.suffix == ".safetensors":
+        if len(inputs) != 1:
+            print("output is a single file but input is a directory; pass a directory output", file=sys.stderr)
+            return 2
+        outputs = [args.output]
+    else:
+        args.output.mkdir(parents=True, exist_ok=True)
+        outputs = [args.output / p.name for p in inputs]
+    total_before = 0
+    total_after = 0
+    print(f"Quantizing {len(inputs)} file(s)\n")
+    for src, dst in zip(inputs, outputs):
+        info = quantize_file(src, dst, verbose=args.verbose)
+        _print_summary(src.name, info)
+        total_before += info["file_size_before"]
+        total_after += info["file_size_after"]
+    print()
+    print("=" * 76)
+    print(
+        f"Total: {total_before / 1e6:.1f} MB -> {total_after / 1e6:.1f} MB "
+        f"({total_before / max(total_after, 1):.2f}x reduction)"
+    )
+    return 0
+if __name__ == "__main__":
+    sys.exit(main())

variants/neural_alu16.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:f702736cd85124aac22602bf44617698309c03739a254b338409df87e22344c9
-size 12434484

 version https://git-lfs.github.com/spec/v1
+oid sha256:424b192a426ea3ccb96d753dcebc032814e01e3d0b15ae8633d832608bdcf7ef
+size 11473981

variants/neural_alu32.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:8c6761fa0366a19cdb9abb7c1c72f53b3a3a07032056b6d17dbed4131cc5e21d
-size 14378864

 version https://git-lfs.github.com/spec/v1
+oid sha256:aa26c2943200c5ed07358bf0cff09da1ecf9f3058786681790458b829a95e663
+size 13258620

variants/neural_alu8.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:246546bba4668a80a81e32b115d883d57b6b49bdfe8254034090089d5bf168cf
-size 11561076

 version https://git-lfs.github.com/spec/v1
+oid sha256:706d0c80f76613b0a9c72f5e3f8bc526bf5d638e9f0e73dd046b769db0d37cf7
+size 10688461

variants/neural_computer16.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d2daa9ab42ab63534010e363adbb3423502ebbe94a4b354797c25dece5eb5948
-size 45730164

 version https://git-lfs.github.com/spec/v1
+oid sha256:9debd616a7493699a87fe6640ac9a9cc15c4658eae2b167fb8d6e10c61668b27
+size 19974859

variants/neural_computer16_reduced.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:a1808dd34084e68120bccd277310749e047c357274440901baf2b01ca64e9e41
-size 14640476

 version https://git-lfs.github.com/spec/v1
+oid sha256:4e52801828ef7434aa24fe039aa8ef0b4d739efe93fc32235d59a0e8b8fc0d58
+size 12163595

variants/neural_computer16_registers.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c7487bbfe4da343bb2072c190e33b7861b452c947efd424a068927c413595049
-size 12534076

 version https://git-lfs.github.com/spec/v1
+oid sha256:b4f76c0b906a0beeb27b4ecae262690c53a7ad61c48bd5bb18cd3a531435ae73
+size 11560755

variants/neural_computer16_scratchpad.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:58dfbdf0a987c1675a68439d86a57a7631a7657ea60b6b0d3e568dfdeee88f2e
-size 12704876

 version https://git-lfs.github.com/spec/v1
+oid sha256:12ef91ad2cb1490a96563ff3cc103b0bf92641bc19d987bfd66539a26c6a75c1
+size 11641459

variants/neural_computer16_small.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:759920dcb38a340ee31f4d116df4983322258b796ac5d6021f7ca165986f5f5b
-size 13104212

 version https://git-lfs.github.com/spec/v1
+oid sha256:7d6f48f1c10f67c9292be41ea575e4ce1302d0e0fd16c8e9071dfb19d204ea1c
+size 11760299

variants/neural_computer32.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:18f4f3420fb307d90ea7a8fe356c196a59d7a0f2ed4ec57679d87b209a7fec22
-size 47693920

 version https://git-lfs.github.com/spec/v1
+oid sha256:c87e96ed650bf30fd25350714a75b9bdc84651077774b2900b69b9cf647a0748
+size 21777922

variants/neural_computer32_reduced.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:51e14c8819de3402881ce2ffe3cdd7e94a801c038c6ef8495110144e9348e2e7
-size 16604104

 version https://git-lfs.github.com/spec/v1
+oid sha256:5569a85aa93c0f7e2ffd67e9ccb5cfb5cdea9b8b881780e1c2d8566f1aef6455
+size 13966650

variants/neural_computer32_registers.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:13ac4eb1c793a6331a2ecfa13d3372edc9f4649163883244847ca6616062de05
-size 14497800

 version https://git-lfs.github.com/spec/v1
+oid sha256:9d5adaf93705e72be7ceb91fe241a7a4c5d0e456b93322022a8e5697c2838828
+size 13363818

variants/neural_computer32_scratchpad.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:d8c389b1730cc297f40944815aebda1b1a71b79bff738e9da767c82609e9d9bd
-size 14668512

 version https://git-lfs.github.com/spec/v1
+oid sha256:5b6713adeab730ba5f605507a6e65320bc71d221146d57da4af8faa29c807d5e
+size 13444514

variants/neural_computer32_small.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:c3c67a2047c0cf7370e802727b9be51d8b7185dedfe108409968f0d838157e04
-size 15067856

 version https://git-lfs.github.com/spec/v1
+oid sha256:57d3c29a466a38c9f03e60d223b5f1c4543dd6a2c0b9607183952bfd0a283b21
+size 13563370

variants/neural_computer8.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:acde9e66a5bae870b5684ddc8592a206f00b518e088e90965a73bfa35274ba2a
-size 44846164

 version https://git-lfs.github.com/spec/v1
+oid sha256:a225796853866f88d04d1568a81324526a53b6eda6c373cef101f99a3cae162f
+size 19180163

variants/neural_computer8_reduced.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7e318727316bfb34f82cdc4a2b627d9f8475c3282cab67a6424ba642350dc823
-size 13756476

 version https://git-lfs.github.com/spec/v1
+oid sha256:10b2a79f5347593bc1c6d8a850308b0e6f37e316427ea546ff77aa1ccc6a8118
+size 11368899

variants/neural_computer8_registers.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:7b2c49af2b18786699351235d4d051afd7452e17616f0f06a87b3e5e9820da66
-size 11649932

 version https://git-lfs.github.com/spec/v1
+oid sha256:4715734c39b883d6af2ddbcbd848df11e39e4e3fab9913b91afaf8709b8ac88c
+size 10766059

variants/neural_computer8_scratchpad.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:40fe6db0454dd6ba33072a18f6c81ed1463830b270b708b9ae45f976e32cfc50
-size 11820860

 version https://git-lfs.github.com/spec/v1
+oid sha256:8d32afb2aed1fa557d1579b9be351fbfd4d7eab8370a26d5181359dfbd75ab64
+size 10846763

variants/neural_computer8_small.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:547aef648729c49dc106c14d05bfcdf12a6f1aca5de5b7d1c475fce65aef1373
-size 12220204

 version https://git-lfs.github.com/spec/v1
+oid sha256:5374476d6c8fed45da1249488865b3ec03ae76289272a1eb28c30d6855001f71
+size 10965603