| | --- |
| | license: apache-2.0 |
| | tags: |
| | - quantization |
| | - unary |
| | - thermometer-encoding |
| | - inference-engine |
| | - low-bit |
| | language: |
| | - en |
| | --- |
| | |
| | # Unary Quantization Research |
| |
|
| | True unary (base-1) quantization for neural network weights. NOT binary. |
| |
|
| | (c) 2026 OpenTransformers Ltd / Scott Bisset |
| |
|
| | ## Overview |
| |
|
| | Unary means magnitude N = N consecutive 1-bits across N bitplanes. Each bitplane contributes value=1, not binary powers. This eliminates multiplication from inference β only addition and popcount. |
| |
|
| | 7-plane unary gives 8 magnitude levels (15 distinct values with sign), achieving 0.97 cosine similarity per layer against FP32 originals. |
| |
|
| | ## Contents |
| |
|
| | ### Converters (Python) |
| | - `unary_convert.py` / `unary_convert_v2.py` β Base unary thermometer conversion |
| | - `convert_proper_unary.py` / `convert_proper_unary_v2.py` β Proper unary with group quantization |
| | - `convert_log_unary.py` β Log-spaced unary variant |
| | - `convert_fast.py` β Optimised conversion pipeline |
| | - `packed_convert.py` / `packed_loader.py` β Packed binary format |
| | - `convert_qwen3.py` / `convert_qwen3_v2.py` β Qwen3-4B specific converters |
| |
|
| | ### C Inference Engines (AVX-512 + POPCNT) |
| | - `unary_engine.c` / `unary_engine_v2.c` β Core unary inference |
| | - `pure_unary_engine.c` β Pure unary (no FP in linear layers) |
| | - `log_unary_engine.c` β Log-unary engine |
| | - `proper_unary.c` β Proper unary with group scales |
| | - `true_unary.c` β True base-1 unary engine |
| | - `concat_unary.c` β Concatenated unary engine |
| | - `packed_engine.c` β Packed bitplane engine |
| | - `unary_full.c` β Full forward pass engine |
| |
|
| | ### Converted Models |
| | - `deepseek-r1-1.5b-*` β DeepSeek-R1-1.5B in multiple unary variants (4-plane, 7-plane, 31-plane, grouped, packed, ternary baseline) |
| | - `qwen3-4b-*` β Qwen3-4B-Thinking in unary, log-unary, and proper-unary variants |
| |
|
| | ### Benchmarks and Runners |
| | - `bench_fwd.py` / `bench_gen.py` / `bench_prompt.py` β Performance benchmarks |
| | - `inference.py` / `server.py` β Python inference and API server |
| | - Various `run_*.py` β Model-specific runners |
| |
|
| | ## Key Insight |
| |
|
| | Unary quantization trades bits-per-weight for computational simplicity. All multiply-accumulate operations become popcount + addition, making this particularly suited for edge/CPU inference where SIMD popcount is fast. |
| |
|
| | ## Building |
| |
|
| | ```bash |
| | gcc -O3 -mavx512f -mavx512bw -mpopcnt -o unary_engine unary_engine.c -lm |
| | ``` |
| |
|
| | ## License |
| |
|
| | Apache 2.0 |
| |
|