Add order of operations circuit (arithmetic.expr_add_mul)

Computes A + (B × C) with correct precedence (multiply before add).

Circuit structure:
- 64 AND gates for B[bit] AND C[stage] masks
- 7 accumulator stages with shifted addition for shift-add multiply
- 8 full adders for final A + result

build.py: add_expr_add_mul(), infer_expr_add_mul_inputs()
eval.py: _test_expr_add_mul() with 73 test cases
Examples: 5 + 3 × 2 = 11, 10 + 4 × 3 = 22
Fitness 1.000000

Files changed (4) hide show

README.md +5 -5
build.py +179 -1
eval.py +154 -0
neural_computer.safetensors +2 -2

README.md CHANGED Viewed

@@ -457,18 +457,18 @@ The interface generalizes to **all** 65,536 8-bit additions once trained—no me
 ### Extension Roadmap
-1. **Order of operations (5 + 3 × 2 = 11)** — Parse expression into tree, evaluate depth-first. MUL before ADD. Requires either: (a) expression parser producing evaluation order, or (b) learned routing that implicitly respects precedence.
-2. **Parenthetical expressions ((5 + 3) × 2 = 16)** — Explicit grouping overrides precedence. Parser must recognize parens and build correct tree. Evaluation proceeds innermost-out. Adds complexity to extraction layer.
-3. **16-bit operations (0-65535)** — Chain two 8-bit circuits with carry propagation. ADD16: low = ADD8(A_lo, B_lo), high = ADD8(A_hi, B_hi, carry_out). MUL16: four partial products + shift-add. Doubles operand extraction width.
-4. **Floating point arithmetic** — IEEE 754-style with separate circuits for mantissa and exponent. ADD: align exponents, add mantissas, renormalize. MUL: add exponents, multiply mantissas. Requires sign handling, overflow detection, and rounding logic.
 ### Completed Extensions
 - **3-operand addition (15 + 27 + 33 = 75)** — `arithmetic.add3_8bit` chains two 8-bit ripple carry stages. 16 full adders, 144 gates, 240 test cases verified.
 ---
 ## Files

 ### Extension Roadmap
+1. **Parenthetical expressions ((5 + 3) × 2 = 16)** — Explicit grouping overrides precedence. Parser must recognize parens and build correct tree. Evaluation proceeds innermost-out. Adds complexity to extraction layer.
+2. **16-bit operations (0-65535)** — Chain two 8-bit circuits with carry propagation. ADD16: low = ADD8(A_lo, B_lo), high = ADD8(A_hi, B_hi, carry_out). MUL16: four partial products + shift-add. Doubles operand extraction width.
+3. **Floating point arithmetic** — IEEE 754-style with separate circuits for mantissa and exponent. ADD: align exponents, add mantissas, renormalize. MUL: add exponents, multiply mantissas. Requires sign handling, overflow detection, and rounding logic.
 ### Completed Extensions
 - **3-operand addition (15 + 27 + 33 = 75)** — `arithmetic.add3_8bit` chains two 8-bit ripple carry stages. 16 full adders, 144 gates, 240 test cases verified.
+- **Order of operations (5 + 3 × 2 = 11)** — `arithmetic.expr_add_mul` computes A + (B × C) using shift-add multiplication then addition. 64 AND gates + 64 full adders, 73 test cases verified.
 ---
 ## Files

build.py CHANGED Viewed

@@ -259,6 +259,56 @@ def add_full_adder(tensors: Dict[str, torch.Tensor], prefix: str) -> None:
     add_gate(tensors, f"{prefix}.carry_or", [1.0, 1.0], [-1.0])
 def add_add3(tensors: Dict[str, torch.Tensor]) -> None:
     """Add 3-operand 8-bit adder circuit.
@@ -649,6 +699,126 @@ def infer_ripplecarry_inputs(gate: str, prefix: str, bits: int, reg: SignalRegis
     return []
 def infer_add3_inputs(gate: str, reg: SignalRegistry) -> List[int]:
     """Infer inputs for 3-operand adder: A + B + C."""
     prefix = "arithmetic.add3_8bit"
@@ -1179,6 +1349,8 @@ def infer_inputs_for_gate(gate: str, reg: SignalRegistry, tensors: Dict[str, tor
             return infer_ripplecarry_inputs(gate, "arithmetic.ripplecarry8bit", 8, reg)
         if 'add3_8bit' in gate:
             return infer_add3_inputs(gate, reg)
         if 'adc8bit' in gate:
             return infer_adcsbc_inputs(gate, "arithmetic.adc8bit", False, reg)
         if 'sbc8bit' in gate:
@@ -1404,7 +1576,7 @@ def cmd_alu(args) -> None:
         "alu.alu8bit.neg.", "alu.alu8bit.rol.", "alu.alu8bit.ror.",
         "arithmetic.greaterthan8bit.", "arithmetic.lessthan8bit.",
         "arithmetic.greaterorequal8bit.", "arithmetic.lessorequal8bit.",
-        "arithmetic.equality8bit.", "arithmetic.add3_8bit.",
         "control.push.", "control.pop.", "control.ret.",
         "combinational.barrelshifter.", "combinational.priorityencoder.",
     ])
@@ -1475,6 +1647,12 @@ def cmd_alu(args) -> None:
         print("  Added ADD3 (16 full adders = 144 gates)")
     except ValueError as e:
         print(f"  ADD3 already exists: {e}")
     if args.apply:
         print(f"\nSaving: {args.model}")
         save_file(tensors, str(args.model))

     add_gate(tensors, f"{prefix}.carry_or", [1.0, 1.0], [-1.0])
+def add_expr_add_mul(tensors: Dict[str, torch.Tensor]) -> None:
+    """Add expression circuit for A + B × C (order of operations).
+    Computes A + (B × C) where multiplication has higher precedence.
+    Structure:
+    - Stage 1: Multiply B × C using shift-add algorithm
+      - 8 mask stages: mask[i] = B AND C[i] (8 AND gates each, shifted)
+      - 7 accumulator adders to sum masked values
+    - Stage 2: Add A to multiplication result (8-bit ripple carry)
+    Inputs: $a[0-7], $b[0-7], $c[0-7] (MSB-first, 8-bit each)
+    Output: 8-bit result of A + (B × C), wrapping on overflow
+    Total: 64 AND gates + 7×8 full adders (mul) + 8 full adders (add) = ~640 gates
+    """
+    prefix = "arithmetic.expr_add_mul"
+    # Stage 1: Multiply B × C using shift-add
+    # For each bit i of C, we AND all bits of B with C[i]
+    # This creates partial products that are shifted by i positions
+    # Mask AND gates: mask[stage][bit] = B[bit] AND C[stage]
+    # These compute B & (C[i] ? 0xFF : 0x00) for each bit of C
+    for stage in range(8):
+        for bit in range(8):
+            add_gate(tensors, f"{prefix}.mul.mask.s{stage}.b{bit}", [1.0, 1.0], [-2.0])
+    # Accumulator adders for shift-add multiplication
+    # Stage 0: acc = mask0 (no adder needed, just the masked value)
+    # Stage 1-7: acc = acc + (mask[i] << i)
+    # We need to handle the shifting by connecting different bit positions
+    # For proper shift-add, we need adders that accumulate partial products
+    # Each stage adds a shifted partial product to the accumulator
+    # Using 16-bit internal accumulator, output low 8 bits
+    # Simplified approach: chain of 8-bit adders with proper bit alignment
+    # acc_stage[i] = acc_stage[i-1] + (mask[i] << i)
+    # We keep only low 8 bits at each stage for 8-bit result
+    for stage in range(1, 8):  # 7 accumulator adders
+        for bit in range(8):
+            add_full_adder(tensors, f"{prefix}.mul.acc.s{stage}.fa{bit}")
+    # Stage 2: Add A to multiplication result
+    for bit in range(8):
+        add_full_adder(tensors, f"{prefix}.add.fa{bit}")
 def add_add3(tensors: Dict[str, torch.Tensor]) -> None:
     """Add 3-operand 8-bit adder circuit.
     return []
+def infer_expr_add_mul_inputs(gate: str, reg: SignalRegistry) -> List[int]:
+    """Infer inputs for A + B × C expression circuit (order of operations).
+    Circuit structure:
+    - Mask stage: mask.s[stage].b[bit] = B[bit] AND C[stage]
+    - Accumulator stages 1-7: acc.s[stage] = acc.s[stage-1] + (mask.s[stage] << stage)
+    - Final add: result = A + acc.s7
+    Bit ordering: MSB-first externally, LSB-first internally (fa0 = LSB, fa7 = MSB)
+    - $x[7] = bit 0 (LSB), $x[0] = bit 7 (MSB)
+    """
+    prefix = "arithmetic.expr_add_mul"
+    # Register all inputs
+    for i in range(8):
+        reg.register(f"$a[{i}]")
+        reg.register(f"$b[{i}]")
+        reg.register(f"$c[{i}]")
+    # Mask AND gates: mask.s[stage].b[bit] = B[bit] AND C[stage]
+    if '.mul.mask.' in gate:
+        m = re.search(r'\.s(\d+)\.b(\d+)', gate)
+        if m:
+            stage = int(m.group(1))
+            bit = int(m.group(2))
+            # MSB-first: $b[7-bit] is bit position 'bit', $c[7-stage] is stage position 'stage'
+            b_input = reg.get_id(f"$b[{7-bit}]")
+            c_input = reg.get_id(f"$c[{7-stage}]")
+            return [b_input, c_input]
+        return []
+    # Accumulator adders: acc.s[stage].fa[bit]
+    if '.mul.acc.' in gate:
+        m = re.search(r'\.s(\d+)\.fa(\d+)\.', gate)
+        if not m:
+            return []
+        stage = int(m.group(1))  # 1-7
+        bit = int(m.group(2))    # 0-7
+        # A input: previous stage output
+        if stage == 1:
+            # First accumulator: A = mask.s0.b[bit] (AND gate output)
+            a_input = reg.register(f"{prefix}.mul.mask.s0.b{bit}")
+        else:
+            # Later stages: A = previous accumulator sum
+            a_input = reg.register(f"{prefix}.mul.acc.s{stage-1}.fa{bit}.ha2.sum.layer2")
+        # B input: (mask.s[stage] << stage)[bit]
+        # Shift left by 'stage' positions means:
+        # - bit positions 0 to stage-1 get 0
+        # - bit position 'bit' gets mask.s[stage].b[bit-stage]
+        if bit < stage:
+            b_input = reg.get_id("#0")
+        else:
+            b_input = reg.register(f"{prefix}.mul.mask.s{stage}.b{bit-stage}")
+        # Carry input
+        if bit == 0:
+            cin = reg.get_id("#0")
+        else:
+            cin = reg.register(f"{prefix}.mul.acc.s{stage}.fa{bit-1}.carry_or")
+        fa_prefix = f"{prefix}.mul.acc.s{stage}.fa{bit}"
+        if '.ha1.sum.layer1' in gate:
+            return [a_input, b_input]
+        if '.ha1.sum.layer2' in gate:
+            return [reg.register(f"{fa_prefix}.ha1.sum.layer1.or"), reg.register(f"{fa_prefix}.ha1.sum.layer1.nand")]
+        if '.ha1.carry' in gate and '.layer' not in gate:
+            return [a_input, b_input]
+        if '.ha2.sum.layer1' in gate:
+            return [reg.register(f"{fa_prefix}.ha1.sum.layer2"), cin]
+        if '.ha2.sum.layer2' in gate:
+            return [reg.register(f"{fa_prefix}.ha2.sum.layer1.or"), reg.register(f"{fa_prefix}.ha2.sum.layer1.nand")]
+        if '.ha2.carry' in gate and '.layer' not in gate:
+            return [reg.register(f"{fa_prefix}.ha1.sum.layer2"), cin]
+        if '.carry_or' in gate:
+            return [reg.register(f"{fa_prefix}.ha1.carry"), reg.register(f"{fa_prefix}.ha2.carry")]
+        return []
+    # Final add stage: A + mul_result
+    if '.add.fa' in gate:
+        m = re.search(r'\.fa(\d+)\.', gate)
+        if not m:
+            return []
+        bit = int(m.group(1))
+        # A input: $a[7-bit] (MSB-first to positional bit)
+        a_input = reg.get_id(f"$a[{7-bit}]")
+        # B input: multiplication result = acc.s7.fa[bit] sum output
+        b_input = reg.register(f"{prefix}.mul.acc.s7.fa{bit}.ha2.sum.layer2")
+        # Carry input
+        if bit == 0:
+            cin = reg.get_id("#0")
+        else:
+            cin = reg.register(f"{prefix}.add.fa{bit-1}.carry_or")
+        fa_prefix = f"{prefix}.add.fa{bit}"
+        if '.ha1.sum.layer1' in gate:
+            return [a_input, b_input]
+        if '.ha1.sum.layer2' in gate:
+            return [reg.register(f"{fa_prefix}.ha1.sum.layer1.or"), reg.register(f"{fa_prefix}.ha1.sum.layer1.nand")]
+        if '.ha1.carry' in gate and '.layer' not in gate:
+            return [a_input, b_input]
+        if '.ha2.sum.layer1' in gate:
+            return [reg.register(f"{fa_prefix}.ha1.sum.layer2"), cin]
+        if '.ha2.sum.layer2' in gate:
+            return [reg.register(f"{fa_prefix}.ha2.sum.layer1.or"), reg.register(f"{fa_prefix}.ha2.sum.layer1.nand")]
+        if '.ha2.carry' in gate and '.layer' not in gate:
+            return [reg.register(f"{fa_prefix}.ha1.sum.layer2"), cin]
+        if '.carry_or' in gate:
+            return [reg.register(f"{fa_prefix}.ha1.carry"), reg.register(f"{fa_prefix}.ha2.carry")]
+        return []
+    return []
 def infer_add3_inputs(gate: str, reg: SignalRegistry) -> List[int]:
     """Infer inputs for 3-operand adder: A + B + C."""
     prefix = "arithmetic.add3_8bit"
             return infer_ripplecarry_inputs(gate, "arithmetic.ripplecarry8bit", 8, reg)
         if 'add3_8bit' in gate:
             return infer_add3_inputs(gate, reg)
+        if 'expr_add_mul' in gate:
+            return infer_expr_add_mul_inputs(gate, reg)
         if 'adc8bit' in gate:
             return infer_adcsbc_inputs(gate, "arithmetic.adc8bit", False, reg)
         if 'sbc8bit' in gate:
         "alu.alu8bit.neg.", "alu.alu8bit.rol.", "alu.alu8bit.ror.",
         "arithmetic.greaterthan8bit.", "arithmetic.lessthan8bit.",
         "arithmetic.greaterorequal8bit.", "arithmetic.lessorequal8bit.",
+        "arithmetic.equality8bit.", "arithmetic.add3_8bit.", "arithmetic.expr_add_mul.",
         "control.push.", "control.pop.", "control.ret.",
         "combinational.barrelshifter.", "combinational.priorityencoder.",
     ])
         print("  Added ADD3 (16 full adders = 144 gates)")
     except ValueError as e:
         print(f"  ADD3 already exists: {e}")
+    print("\nGenerating expression A + B × C circuit...")
+    try:
+        add_expr_add_mul(tensors)
+        print("  Added EXPR_ADD_MUL (64 AND + 56 + 8 full adders = 640 gates)")
+    except ValueError as e:
+        print(f"  EXPR_ADD_MUL already exists: {e}")
     if args.apply:
         print(f"\nSaving: {args.model}")
         save_file(tensors, str(args.model))

eval.py CHANGED Viewed

@@ -631,6 +631,154 @@ class BatchedFitnessEvaluator:
         return correct, num_tests
     # =========================================================================
     # COMPARATORS
     # =========================================================================
@@ -2450,6 +2598,12 @@ class BatchedFitnessEvaluator:
         total_tests += t
         self.category_scores['add3'] = (s[0].item() if pop_size == 1 else s.mean().item(), t)
         # Comparators
         s, t = self._test_comparators(population, debug)
         scores += s

         return correct, num_tests
+    # =========================================================================
+    # ORDER OF OPERATIONS (A + B × C)
+    # =========================================================================
+    def _test_expr_add_mul(self, pop: Dict, debug: bool) -> Tuple[torch.Tensor, int]:
+        """Test A + B × C expression circuit (order of operations)."""
+        pop_size = next(iter(pop.values())).shape[0]
+        if debug:
+            print(f"\n=== ORDER OF OPERATIONS (A + B × C) ===")
+        prefix = 'arithmetic.expr_add_mul'
+        bits = 8
+        # Test cases for order of operations
+        test_cases = []
+        # Specific examples from roadmap
+        test_cases.extend([
+            (5, 3, 2),    # 5 + 3 × 2 = 5 + 6 = 11
+            (10, 4, 3),   # 10 + 4 × 3 = 10 + 12 = 22
+            (1, 10, 10),  # 1 + 10 × 10 = 1 + 100 = 101
+            (0, 15, 17),  # 0 + 15 × 17 = 255
+            (1, 15, 17),  # 1 + 15 × 17 = 256 -> 0 (overflow)
+            (100, 5, 5),  # 100 + 5 × 5 = 100 + 25 = 125
+        ])
+        # Edge cases
+        test_cases.extend([
+            (0, 0, 0),    # 0 + 0 × 0 = 0
+            (255, 0, 0),  # 255 + 0 × 0 = 255
+            (0, 255, 1),  # 0 + 255 × 1 = 255
+            (0, 1, 255),  # 0 + 1 × 255 = 255
+            (1, 1, 1),    # 1 + 1 × 1 = 2
+            (0, 16, 16),  # 0 + 16 × 16 = 256 -> 0 (overflow)
+        ])
+        # Systematic small values
+        for a in [0, 1, 5, 10]:
+            for b in [0, 1, 2, 3]:
+                for c in [0, 1, 2, 3]:
+                    test_cases.append((a, b, c))
+        # Remove duplicates
+        test_cases = list(set(test_cases))
+        a_vals = torch.tensor([t[0] for t in test_cases], device=self.device)
+        b_vals = torch.tensor([t[1] for t in test_cases], device=self.device)
+        c_vals = torch.tensor([t[2] for t in test_cases], device=self.device)
+        num_tests = len(test_cases)
+        # Convert to bits [num_tests, bits] MSB-first
+        a_bits = torch.stack([((a_vals >> (bits - 1 - i)) & 1).float() for i in range(bits)], dim=1)
+        b_bits = torch.stack([((b_vals >> (bits - 1 - i)) & 1).float() for i in range(bits)], dim=1)
+        c_bits = torch.stack([((c_vals >> (bits - 1 - i)) & 1).float() for i in range(bits)], dim=1)
+        # Evaluate mask stage: mask[stage][bit] = B[bit] AND C[stage]
+        # In the circuit: mask.s[stage].b[bit] operates on positional bits
+        # stage 0 = LSB of C (c_bits[:, 7]), stage 7 = MSB of C (c_bits[:, 0])
+        # bit 0 = LSB of B (b_bits[:, 7]), bit 7 = MSB of B (b_bits[:, 0])
+        masks = torch.zeros(8, num_tests, pop_size, 8, device=self.device)  # [stage, tests, pop, bits]
+        for stage in range(8):
+            c_stage_bit = c_bits[:, 7 - stage].unsqueeze(1).expand(-1, pop_size)  # C[stage]
+            for bit in range(8):
+                b_bit_val = b_bits[:, 7 - bit].unsqueeze(1).expand(-1, pop_size)  # B[bit]
+                # AND gate
+                w = pop.get(f'{prefix}.mul.mask.s{stage}.b{bit}.weight')
+                bias = pop.get(f'{prefix}.mul.mask.s{stage}.b{bit}.bias')
+                if w is not None and bias is not None:
+                    w = w.squeeze(-1)  # [pop]
+                    b_tensor = bias.squeeze(-1)  # [pop]
+                    # Properly broadcast for batch evaluation
+                    inp = torch.stack([b_bit_val, c_stage_bit], dim=-1)  # [tests, pop, 2]
+                    out = heaviside(torch.einsum('tpi,pi->tp', inp, w) + b_tensor)
+                    masks[stage, :, :, bit] = out
+        # Accumulator stages
+        # acc[0] = mask[0] (no shift)
+        # acc[1] = acc[0] + (mask[1] << 1)
+        # ...
+        # acc[7] = acc[6] + (mask[7] << 7)
+        acc = masks[0].clone()  # [tests, pop, 8] - start with mask[0]
+        for stage in range(1, 8):
+            # Create shifted mask: (mask[stage] << stage)
+            # Shift left by 'stage' positions: bits 0..stage-1 become 0, bit k becomes mask[stage][k-stage]
+            shifted_mask = torch.zeros(num_tests, pop_size, 8, device=self.device)
+            for bit in range(8):
+                if bit >= stage:
+                    shifted_mask[:, :, bit] = masks[stage, :, :, bit - stage]
+                # else: remains 0
+            # Add acc + shifted_mask using full adders
+            carry = torch.zeros(num_tests, pop_size, device=self.device)
+            new_acc = torch.zeros(num_tests, pop_size, 8, device=self.device)
+            for bit in range(8):
+                s, carry = self._eval_single_fa(
+                    pop, f'{prefix}.mul.acc.s{stage}.fa{bit}',
+                    acc[:, :, bit],
+                    shifted_mask[:, :, bit],
+                    carry
+                )
+                new_acc[:, :, bit] = s
+            acc = new_acc
+        # Final add stage: A + acc (multiplication result)
+        carry = torch.zeros(num_tests, pop_size, device=self.device)
+        result_bits = []
+        for bit in range(8):
+            a_bit_val = a_bits[:, 7 - bit].unsqueeze(1).expand(-1, pop_size)
+            s, carry = self._eval_single_fa(
+                pop, f'{prefix}.add.fa{bit}',
+                a_bit_val,
+                acc[:, :, bit],
+                carry
+            )
+            result_bits.append(s)
+        # Reconstruct result
+        result_bits = torch.stack(result_bits[::-1], dim=-1)  # MSB first
+        result = torch.zeros(num_tests, pop_size, device=self.device)
+        for i in range(bits):
+            result += result_bits[:, :, i] * (1 << (bits - 1 - i))
+        # Expected: A + (B × C), with 8-bit wrap
+        expected = ((a_vals + b_vals * c_vals) & 0xFF).unsqueeze(1).expand(-1, pop_size).float()
+        correct = (result == expected).float().sum(0)
+        failures = []
+        if pop_size == 1:
+            for i in range(min(num_tests, 100)):
+                if result[i, 0].item() != expected[i, 0].item():
+                    failures.append((
+                        [int(a_vals[i].item()), int(b_vals[i].item()), int(c_vals[i].item())],
+                        int(expected[i, 0].item()),
+                        int(result[i, 0].item())
+                    ))
+        self._record(prefix, int(correct[0].item()), num_tests, failures)
+        if debug:
+            r = self.results[-1]
+            print(f"  {r.name}: {r.passed}/{r.total} {'PASS' if r.success else 'FAIL'}")
+            if failures:
+                for inp, exp, got in failures[:5]:
+                    print(f"    FAIL: {inp[0]} + {inp[1]} × {inp[2]} = {exp}, got {got}")
+        return correct, num_tests
     # =========================================================================
     # COMPARATORS
     # =========================================================================
         total_tests += t
         self.category_scores['add3'] = (s[0].item() if pop_size == 1 else s.mean().item(), t)
+        # Order of operations (A + B × C)
+        s, t = self._test_expr_add_mul(population, debug)
+        scores += s
+        total_tests += t
+        self.category_scores['expr_add_mul'] = (s[0].item() if pop_size == 1 else s.mean().item(), t)
         # Comparators
         s, t = self._test_comparators(population, debug)
         scores += s

neural_computer.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:270309b1ac57e808827cee555b6f6f9e3f14c37abe23fa21069db4ff251a0b72
-size 34552948

 version https://git-lfs.github.com/spec/v1
+oid sha256:eaabeed4fa50c13129fe4f83f6a8f31b6ccd41de12e83c62448460881373fc3e
+size 34838348