threshold-sklansky

4-bit Sklansky parallel prefix adder. Achieves minimum possible depth (logβ‚‚n) at the cost of maximum fanout. The fastest parallel prefix adder when fanout is not a constraint.

Function

S[3:0], Cout = A[3:0] + B[3:0] + Cin

Sklansky Structure (4-bit)

       G3,P3    G2,P2    G1,P1    G0,P0    Cin
         β”‚        β”‚        β”‚        β”‚       β”‚
         β”‚        β”‚        β”‚        β”‚       β”‚
Level 1  ●─────────        ●─────────       β”‚
         β”‚        β”‚        β”‚        β”‚       β”‚
         β”‚        β”‚        β”‚        β”‚       β”‚
Level 2  β—β”€β”€β”€β”€β”€β”€β”€β”€β—β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”˜       β”‚   ← HIGH FANOUT: G1:0 feeds both G2:0 and G3:0
         β”‚        β”‚                         β”‚
         β”‚        β”‚                         β”‚
       G3:0     G2:0     G1:0     G0       β”‚
         β”‚        β”‚        β”‚        β”‚       β”‚
         β–Ό        β–Ό        β–Ό        β–Ό       β”‚
        XOR      XOR      XOR      XOR β”€β”€β”€β”€β”€β”˜
         β”‚        β”‚        β”‚        β”‚
         β–Ό        β–Ό        β–Ό        β–Ό
       Cout      S3       S2       S1       S0

The Fanout Problem

Sklansky's key characteristic is maximum fanout at the expense of wiring complexity:

                    Level 2 Fanout
                    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                    β”‚             β”‚
     G3:2 ───────────             β”‚
                    β”‚   G1:0     β”œβ”€β”€β–Ί feeds G2:0
     G3:0 ◄──────────   (prefix) β”œβ”€β”€β–Ί feeds G3:0
                    β”‚             β”‚
                    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

     At 64 bits: G31:0 would fan out to 32 cells!

For large adders, this fanout becomes a critical path bottleneck due to wire capacitance.

Comparison: Parallel Prefix Adder Zoo

Adder Depth Prefix Cells Max Fanout Best For
Sklansky logβ‚‚(n) nΒ·logβ‚‚(n)/2 n/2 Small adders, FPGAs
Kogge-Stone logβ‚‚(n) nΒ·logβ‚‚(n)-n+1 2 Speed-critical, uniform load
Brent-Kung 2Β·logβ‚‚(n)-2 2n-2-logβ‚‚(n) 2 Area-constrained
Han-Carlson logβ‚‚(n)+1 varies varies Balanced trade-off

Why Sklansky Matters

  1. Optimal depth: Cannot do better than logβ‚‚(n) for parallel prefix
  2. Minimal prefix cells: Uses fewest cells among depth-optimal adders
  3. Simple structure: Regular, easy to generate programmatically
  4. FPGA-friendly: FPGAs handle fanout better than ASICs

Parallel Prefix Operation

(G_high, P_high) β—‹ (G_low, P_low) = (G_high + P_highΒ·G_low, P_highΒ·P_low)

This associative operation enables the parallel prefix tree construction.

Truth Table (Examples)

A B Cin S Cout Decimal
0000 0000 0 0000 0 0+0=0
0001 0001 0 0010 0 1+1=2
0110 0011 0 1001 0 6+3=9
1001 0111 0 0000 1 9+7=16
1111 1111 1 1111 1 15+15+1=31

Threshold Implementation

Generate (AND gate)

G_i: w[A_i]=1.0, w[B_i]=1.0, bias=-2.0
     fires when A_i + B_i >= 2 (both are 1)

Propagate (XOR via OR-NAND-AND)

P_i = (A_i OR B_i) AND (A_i NAND B_i)
    OR:   w=[1,1], b=-1  β†’ fires if either input is 1
    NAND: w=[-1,-1], b=1 β†’ fires unless both inputs are 1
    AND:  w=[1,1], b=-2  β†’ fires if both OR and NAND fire

Parameters

Inputs 9 (A[3:0], B[3:0], Cin)
Outputs 5 (S[3:0], Cout)
Neurons 32
Layers 4
Parameters 132
Magnitude 56

Usage

from safetensors.torch import load_file
import torch

w = load_file('model.safetensors')

def sklansky_add(a3, a2, a1, a0, b3, b2, b1, b0, cin):
    a = [a0, a1, a2, a3]
    b = [b0, b1, b2, b3]

    # Generate and Propagate
    g = [a[i] & b[i] for i in range(4)]
    p = [a[i] ^ b[i] for i in range(4)]

    # Level 1: pairs
    g10 = g[1] | (p[1] & g[0])
    p10 = p[1] & p[0]
    g32 = g[3] | (p[3] & g[2])
    p32 = p[3] & p[2]

    # Level 2: Sklansky's high-fanout merge
    # G1:0 fans out to both G2:0 and G3:0 computation
    g20 = g[2] | (p[2] & g10)
    g30 = g32 | (p32 & g10)
    p30 = p32 & p10

    # Carries
    c0 = g[0] | (p[0] & cin)
    c1 = g10 | (p10 & cin)
    c2 = g20 | (p[2] & p10 & cin)
    c3 = g30 | (p30 & cin)

    # Sums
    return p[3]^c2, p[2]^c1, p[1]^c0, p[0]^cin, c3

# Test: 6 + 3 = 9
s3, s2, s1, s0, cout = sklansky_add(0,1,1,0, 0,0,1,1, 0)
print(f"6 + 3 = {s3*8 + s2*4 + s1*2 + s0}")  # Output: 9

Verification

All 512 input combinations (16 Γ— 16 Γ— 2) exhaustively verified correct.

Files

threshold-sklansky/
β”œβ”€β”€ model.safetensors    # Threshold network weights
β”œβ”€β”€ create_safetensors.py # Weight generation + exhaustive verification
β”œβ”€β”€ config.json          # Circuit metadata
└── README.md

References

  • Sklansky, J. (1960). "Conditional-Sum Addition Logic"
  • IRE Transactions on Electronic Computers, EC-9(2), pp. 226-231
  • The original paper that introduced this construction

License

MIT

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including phanerozoic/threshold-sklansky