File size: 2,398 Bytes
19ed98b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
---
license: apache-2.0
tags:
  - quantization
  - unary
  - thermometer-encoding
  - inference-engine
  - low-bit
language:
  - en
---

# Unary Quantization Research

True unary (base-1) quantization for neural network weights. NOT binary.

(c) 2026 OpenTransformers Ltd / Scott Bisset

## Overview

Unary means magnitude N = N consecutive 1-bits across N bitplanes. Each bitplane contributes value=1, not binary powers. This eliminates multiplication from inference β€” only addition and popcount.

7-plane unary gives 8 magnitude levels (15 distinct values with sign), achieving 0.97 cosine similarity per layer against FP32 originals.

## Contents

### Converters (Python)
- `unary_convert.py` / `unary_convert_v2.py` β€” Base unary thermometer conversion
- `convert_proper_unary.py` / `convert_proper_unary_v2.py` β€” Proper unary with group quantization
- `convert_log_unary.py` β€” Log-spaced unary variant
- `convert_fast.py` β€” Optimised conversion pipeline
- `packed_convert.py` / `packed_loader.py` β€” Packed binary format
- `convert_qwen3.py` / `convert_qwen3_v2.py` β€” Qwen3-4B specific converters

### C Inference Engines (AVX-512 + POPCNT)
- `unary_engine.c` / `unary_engine_v2.c` β€” Core unary inference
- `pure_unary_engine.c` β€” Pure unary (no FP in linear layers)
- `log_unary_engine.c` β€” Log-unary engine
- `proper_unary.c` β€” Proper unary with group scales
- `true_unary.c` β€” True base-1 unary engine
- `concat_unary.c` β€” Concatenated unary engine
- `packed_engine.c` β€” Packed bitplane engine
- `unary_full.c` β€” Full forward pass engine

### Converted Models
- `deepseek-r1-1.5b-*` β€” DeepSeek-R1-1.5B in multiple unary variants (4-plane, 7-plane, 31-plane, grouped, packed, ternary baseline)
- `qwen3-4b-*` β€” Qwen3-4B-Thinking in unary, log-unary, and proper-unary variants

### Benchmarks and Runners
- `bench_fwd.py` / `bench_gen.py` / `bench_prompt.py` β€” Performance benchmarks
- `inference.py` / `server.py` β€” Python inference and API server
- Various `run_*.py` β€” Model-specific runners

## Key Insight

Unary quantization trades bits-per-weight for computational simplicity. All multiply-accumulate operations become popcount + addition, making this particularly suited for edge/CPU inference where SIMD popcount is fast.

## Building

```bash
gcc -O3 -mavx512f -mavx512bw -mpopcnt -o unary_engine unary_engine.c -lm
```

## License

Apache 2.0