File size: 4,666 Bytes
5648530
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
---

license: mit
tags:
- pytorch
- safetensors
- threshold-logic
- neuromorphic
- arithmetic
- multiplier
- compressor
---


# threshold-4to2-compressor

4:2 compressor for high-speed multiplier trees. Reduces 4 input bits plus carry-in to 2 output bits plus carry-out while preserving arithmetic value.

## Circuit

```

   x      y      z      w      cin

   β”‚      β”‚      β”‚      β”‚       β”‚

   β””β”€β”€β”¬β”€β”€β”€β”΄β”€β”€β”¬β”€β”€β”€β”΄β”€β”€β”¬β”€β”€β”€β”˜       β”‚

      β”‚      β”‚      β”‚           β”‚

      β–Ό      β”‚      β”‚           β”‚

   β”Œβ”€β”€β”€β”€β”€β”   β”‚      β”‚           β”‚

   β”‚XOR  β”‚   β”‚      β”‚           β”‚

   β”‚(x,y)β”‚   β”‚      β”‚           β”‚

   β””β”€β”€β”¬β”€β”€β”˜   β”‚      β”‚           β”‚

      β”‚      β”‚      β”‚           β”‚

      β–Ό      β–Ό      β”‚           β”‚

   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”‚           β”‚

   β”‚  XOR(xy,z)  β”‚  β”‚           β”‚

   β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β”‚           β”‚

          β”‚         β”‚           β”‚

          β–Ό         β–Ό           β”‚

       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”‚

       β”‚  XOR(xyz,w)  β”‚         β”‚

       β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜         β”‚

              β”‚                 β”‚

              β–Ό                 β–Ό

           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”

           β”‚    XOR(xyzw, cin)   │───► Sum

           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜



   cout = MAJ(x,y,z)     (independent of w, cin)

   carry = MAJ(XOR(x,y,z), w, cin)

```

## Function

```

compress_4to2(x, y, z, w, cin) -> (sum, carry, cout)



Invariant: x + y + z + w + cin = sum + 2*carry + 2*cout

```

## Truth Table (partial - 32 combinations)

| x | y | z | w | cin | sum | carry | cout | verify |
|---|---|---|---|-----|-----|-------|------|--------|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0=0 |
| 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1=1 |
| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 2=2 |
| 1 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 3=3 |
| 1 | 1 | 1 | 1 | 0 | 0 | 1 | 1 | 4=4 |
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 5=5 |

Input sum range: 0 to 5
Output encoding: sum + 2*carry + 2*cout (range 0-5)

## Mechanism

The 4:2 compressor is built from two cascaded 3:2 compressors with a twist:

**Stage 1: Compress (x, y, z)**
- sum1 = x XOR y XOR z
- cout = MAJ(x, y, z)  ← This goes to next column

**Stage 2: Compress (sum1, w, cin)**
- sum = sum1 XOR w XOR cin
- carry = MAJ(sum1, w, cin)  ← This goes to next column

Key insight: The cout is computed early and can propagate horizontally while sum/carry are still being computed.

## Architecture

| Component | Function | Neurons | Layers |
|-----------|----------|---------|--------|
| XOR(x,y) | First pair | 3 | 2 |
| XOR(xy,z) | Add third | 3 | 2 |
| MAJ(x,y,z) | cout | 1 | 1 |
| XOR(xyz,w) | Add fourth | 3 | 2 |
| XOR(xyzw,cin) | sum | 3 | 2 |
| MAJ(xyz,w,cin) | carry | 1 | 1 |

**Total: 14 neurons**

## Parameters

| | |
|---|---|
| Inputs | 5 (x, y, z, w, cin) |
| Outputs | 3 (sum, carry, cout) |
| Neurons | 14 |
| Layers | 8 |
| Parameters | 44 |
| Magnitude | 46 |

## Delay Analysis

Critical path for sum: 4 XOR stages = 8 layers
Critical path for carry: 4 XOR stages + 1 MAJ = 9 layers
Critical path for cout: 1 MAJ = 1 layer (very fast!)

The early cout enables fast horizontal carry propagation in multiplier arrays.

## Usage

```python

from safetensors.torch import load_file

import torch



w = load_file('model.safetensors')



def compress_4to2(x, y, z, w_in, cin):

    # Implementation details in model.py

    pass



# Example: sum of 5 bits

s, carry, cout = compress_4to2(1, 1, 1, 1, 1)

print(f"1+1+1+1+1 = {s} + 2*{carry} + 2*{cout} = {s + 2*carry + 2*cout}")

# Output: 1+1+1+1+1 = 1 + 2*1 + 2*1 = 5

```

## Applications

- Booth multipliers (radix-4)
- Wallace/Dadda tree reduction
- FMA (fused multiply-add) units
- High-performance DSP

## Comparison with 3:2 Compressor

| Property | 3:2 | 4:2 |
|----------|-----|-----|
| Inputs | 3 | 5 (4 + cin) |
| Outputs | 2 | 3 (2 + cout) |
| Reduction ratio | 3β†’2 | 4β†’2 per column |
| Neurons | 7 | 14 |
| Tree depth for n bits | O(log₁.β‚… n) | O(logβ‚‚ n) |

4:2 compressors provide faster reduction in multiplier trees.

## Files

```

threshold-4to2-compressor/

β”œβ”€β”€ model.safetensors

β”œβ”€β”€ model.py

β”œβ”€β”€ create_safetensors.py

β”œβ”€β”€ config.json

└── README.md

```

## License

MIT