OpenTransformer
/

unary-quantization-research

thermometer-encoding

inference-engine

Model card Files Files and versions

unary-quantization-research / README.md

OpenTransformer's picture

OpenTransformer

Add files using upload-large-folder tool

19ed98b verified 14 days ago

|

history blame contribute delete

2.4 kB

	---
	license: apache-2.0
	tags:
	- quantization
	- unary
	- thermometer-encoding
	- inference-engine
	- low-bit
	language:
	- en
	---

	# Unary Quantization Research

	True unary (base-1) quantization for neural network weights. NOT binary.

	(c) 2026 OpenTransformers Ltd / Scott Bisset

	## Overview

	Unary means magnitude N = N consecutive 1-bits across N bitplanes. Each bitplane contributes value=1, not binary powers. This eliminates multiplication from inference — only addition and popcount.

	7-plane unary gives 8 magnitude levels (15 distinct values with sign), achieving 0.97 cosine similarity per layer against FP32 originals.

	## Contents

	### Converters (Python)
	- `unary_convert.py` / `unary_convert_v2.py` — Base unary thermometer conversion
	- `convert_proper_unary.py` / `convert_proper_unary_v2.py` — Proper unary with group quantization
	- `convert_log_unary.py` — Log-spaced unary variant
	- `convert_fast.py` — Optimised conversion pipeline
	- `packed_convert.py` / `packed_loader.py` — Packed binary format
	- `convert_qwen3.py` / `convert_qwen3_v2.py` — Qwen3-4B specific converters

	### C Inference Engines (AVX-512 + POPCNT)
	- `unary_engine.c` / `unary_engine_v2.c` — Core unary inference
	- `pure_unary_engine.c` — Pure unary (no FP in linear layers)
	- `log_unary_engine.c` — Log-unary engine
	- `proper_unary.c` — Proper unary with group scales
	- `true_unary.c` — True base-1 unary engine
	- `concat_unary.c` — Concatenated unary engine
	- `packed_engine.c` — Packed bitplane engine
	- `unary_full.c` — Full forward pass engine

	### Converted Models
	- `deepseek-r1-1.5b-*` — DeepSeek-R1-1.5B in multiple unary variants (4-plane, 7-plane, 31-plane, grouped, packed, ternary baseline)
	- `qwen3-4b-*` — Qwen3-4B-Thinking in unary, log-unary, and proper-unary variants

	### Benchmarks and Runners
	- `bench_fwd.py` / `bench_gen.py` / `bench_prompt.py` — Performance benchmarks
	- `inference.py` / `server.py` — Python inference and API server
	- Various `run_*.py` — Model-specific runners

	## Key Insight

	Unary quantization trades bits-per-weight for computational simplicity. All multiply-accumulate operations become popcount + addition, making this particularly suited for edge/CPU inference where SIMD popcount is fast.

	## Building

	```bash
	gcc -O3 -mavx512f -mavx512bw -mpopcnt -o unary_engine unary_engine.c -lm
	```

	## License

	Apache 2.0