| | ---
|
| | license: mit
|
| | tags:
|
| | - pytorch
|
| | - safetensors
|
| | - threshold-logic
|
| | - neuromorphic
|
| | - arithmetic
|
| | - multiplier
|
| | - compressor
|
| | ---
|
| |
|
| | # threshold-4to2-compressor
|
| |
|
| | 4:2 compressor for high-speed multiplier trees. Reduces 4 input bits plus carry-in to 2 output bits plus carry-out while preserving arithmetic value.
|
| |
|
| | ## Circuit
|
| |
|
| | ```
|
| | x y z w cin
|
| | β β β β β
|
| | ββββ¬ββββ΄βββ¬ββββ΄βββ¬ββββ β
|
| | β β β β
|
| | βΌ β β β
|
| | βββββββ β β β
|
| | βXOR β β β β
|
| | β(x,y)β β β β
|
| | ββββ¬βββ β β β
|
| | β β β β
|
| | βΌ βΌ β β
|
| | βββββββββββββββ β β
|
| | β XOR(xy,z) β β β
|
| | ββββββββ¬βββββββ β β
|
| | β β β
|
| | βΌ βΌ β
|
| | ββββββββββββββββ β
|
| | β XOR(xyz,w) β β
|
| | ββββββββ¬ββββββββ β
|
| | β β
|
| | βΌ βΌ
|
| | βββββββββββββββββββββββ
|
| | β XOR(xyzw, cin) βββββΊ Sum
|
| | βββββββββββββββββββββββ
|
| |
|
| | cout = MAJ(x,y,z) (independent of w, cin)
|
| | carry = MAJ(XOR(x,y,z), w, cin)
|
| | ```
|
| |
|
| | ## Function
|
| |
|
| | ```
|
| | compress_4to2(x, y, z, w, cin) -> (sum, carry, cout)
|
| |
|
| | Invariant: x + y + z + w + cin = sum + 2*carry + 2*cout
|
| | ```
|
| |
|
| | ## Truth Table (partial - 32 combinations)
|
| |
|
| | | x | y | z | w | cin | sum | carry | cout | verify |
|
| | |---|---|---|---|-----|-----|-------|------|--------|
|
| | | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0=0 |
|
| | | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1=1 |
|
| | | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 2=2 |
|
| | | 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 3=3 |
|
| | | 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 4=4 |
|
| | | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5=5 |
|
| |
|
| | Input sum range: 0 to 5
|
| | Output encoding: sum + 2*carry + 2*cout (range 0-5)
|
| |
|
| | ## Mechanism
|
| |
|
| | The 4:2 compressor is built from two cascaded 3:2 compressors with a twist:
|
| |
|
| | **Stage 1: Compress (x, y, z)**
|
| | - sum1 = x XOR y XOR z
|
| | - cout = MAJ(x, y, z) β This goes to next column
|
| |
|
| | **Stage 2: Compress (sum1, w, cin)**
|
| | - sum = sum1 XOR w XOR cin
|
| | - carry = MAJ(sum1, w, cin) β This goes to next column
|
| |
|
| | Key insight: The cout is computed early and can propagate horizontally while sum/carry are still being computed.
|
| |
|
| | ## Architecture
|
| |
|
| | | Component | Function | Neurons | Layers |
|
| | |-----------|----------|---------|--------|
|
| | | XOR(x,y) | First pair | 3 | 2 |
|
| | | XOR(xy,z) | Add third | 3 | 2 |
|
| | | MAJ(x,y,z) | cout | 1 | 1 |
|
| | | XOR(xyz,w) | Add fourth | 3 | 2 |
|
| | | XOR(xyzw,cin) | sum | 3 | 2 |
|
| | | MAJ(xyz,w,cin) | carry | 1 | 1 |
|
| |
|
| | **Total: 14 neurons**
|
| |
|
| | ## Parameters
|
| |
|
| | | | |
|
| | |---|---|
|
| | | Inputs | 5 (x, y, z, w, cin) |
|
| | | Outputs | 3 (sum, carry, cout) |
|
| | | Neurons | 14 |
|
| | | Layers | 8 |
|
| | | Parameters | 44 |
|
| | | Magnitude | 46 |
|
| |
|
| | ## Delay Analysis
|
| |
|
| | Critical path for sum: 4 XOR stages = 8 layers
|
| | Critical path for carry: 4 XOR stages + 1 MAJ = 9 layers
|
| | Critical path for cout: 1 MAJ = 1 layer (very fast!)
|
| |
|
| | The early cout enables fast horizontal carry propagation in multiplier arrays.
|
| |
|
| | ## Usage
|
| |
|
| | ```python
|
| | from safetensors.torch import load_file
|
| | import torch
|
| |
|
| | w = load_file('model.safetensors')
|
| |
|
| | def compress_4to2(x, y, z, w_in, cin):
|
| | # Implementation details in model.py
|
| | pass
|
| |
|
| | # Example: sum of 5 bits
|
| | s, carry, cout = compress_4to2(1, 1, 1, 1, 1)
|
| | print(f"1+1+1+1+1 = {s} + 2*{carry} + 2*{cout} = {s + 2*carry + 2*cout}")
|
| | # Output: 1+1+1+1+1 = 1 + 2*1 + 2*1 = 5
|
| | ```
|
| |
|
| | ## Applications
|
| |
|
| | - Booth multipliers (radix-4)
|
| | - Wallace/Dadda tree reduction
|
| | - FMA (fused multiply-add) units
|
| | - High-performance DSP
|
| |
|
| | ## Comparison with 3:2 Compressor
|
| |
|
| | | Property | 3:2 | 4:2 |
|
| | |----------|-----|-----|
|
| | | Inputs | 3 | 5 (4 + cin) |
|
| | | Outputs | 2 | 3 (2 + cout) |
|
| | | Reduction ratio | 3β2 | 4β2 per column |
|
| | | Neurons | 7 | 14 |
|
| | | Tree depth for n bits | O(logβ.β
n) | O(logβ n) |
|
| |
|
| | 4:2 compressors provide faster reduction in multiplier trees.
|
| |
|
| | ## Files
|
| |
|
| | ```
|
| | threshold-4to2-compressor/
|
| | βββ model.safetensors
|
| | βββ model.py
|
| | βββ create_safetensors.py
|
| | βββ config.json
|
| | βββ README.md
|
| | ```
|
| |
|
| | ## License
|
| |
|
| | MIT
|
| |
|