Model Comparison
All Variants
| Model | Latent | Enc | Pred | Params | f32 Size | Quantized | Cos | ESP32 predict | ESP32 encode | Best For |
|---|---|---|---|---|---|---|---|---|---|---|
| baseline | 192 | 6 | 6 | ~14M | 54.6 MB | 10.9 MB (INT8+Q4) | 0.999 | 828 ms | ~10,000 ms | Quality reference |
| slim_48d | 48 | 2 | 2 | ~3M | ~2 MB | ~1 MB | pending | ~300 ms* | ~3,000 ms* | Tiny edge |
| slim_64d | 64 | 3 | 3 | ~5M | ~3 MB | ~2 MB | pending | ~400 ms* | ~4,000 ms* | Small edge |
| slim_96d | 96 | 2 | 3 | ~8M | ~3.5 MB | ~2 MB | pending | ~400 ms* | ~4,000 ms* | Balanced |
| slim_96d | 96 | 4 | 4 | ~10M | 36.8 MB | 9.8 MB (INT8+Q4) | 0.9982 | 583 ms | 6,416 ms | Production |
| slim_128d | 128 | 4 | 4 | ~12M | ~5 MB | ~3 MB | pending | ~500 ms* | ~5,000 ms* | Quality bias |
| slim_192d | 192 | 4 | 4 | ~13M | ~40 MB | ~12 MB | pending | ~600 ms* | ~6,000 ms* | Layer depth |
| hybrid_ALAL | 64 | 4 | 4 | 3.0M | ~12 MB | 3.9 MB (LQ40) | pending | 152 ms | 922 ms* | Max compression |
| elastic | 64 | 4 | 4 | ~10M | ~4 MB | ~2 MB | pending | ~400 ms* | ~4,000 ms* | Truncatable |
| baseline Q4 pred | 192 | 6 | 6 | ~14M | 54.6 MB | 23.6 MB | 0.998 | 828 ms | ~10,000 ms | Large edge |
| WANDA 20% | 192 | 6 | 6 | ~14M | 54.6 MB | 22.0 MB | ~0.99 | ~660 ms* | ~10,000 ms | Pruning research |
| WANDA 40% | 192 | 6 | 6 | ~14M | 54.6 MB | 25.1 MB | ~0.97 | ~500 ms* | ~10,000 ms | Max sparsity |
* Projected — not yet benchmarked on hardware
Quantization Formats
| Format | Encoder | Predictor | Compression | Quality | Hardware |
|---|---|---|---|---|---|
| f32 | f32 | f32 | 1x | 1.000 | All |
| INT8 | INT8 per-channel | f32 | ~2x | 0.9999 | ESP32, host |
| INT8+Q4 | INT8 per-channel | Q4 block | ~5x | 0.999 | Production |
| Q4 pred only | f32 | Q4 block | ~2x | 0.998 | Large edge |
| Full Q4 | Q4 block | Q4 block | ~6x | 0.93 | Research |
| Ternary | - | {-1,0,+1} | ~8x | ~0.85 | Experimental |
| WANDA pruned | - | Q4 + sparse | ~0.8x | ~0.97 | Research |
Hardware Tiers
| Tier | Model | Format | Size | Use Case |
|---|---|---|---|---|
| ESP32-P4 | hybrid_ALAL, slim_96d | LQ40 | 3.9-9.8 MB | Edge robotics |
| Browser WASM | slim_96d | LQ40 | 9.8 MB | Client-side demos |
| Host CPU | any | safetensors | 2-54 MB | Development |
| FPGA | baseline, slim | Q4 → hardwired | 0 MB (gates) | Custom silicon |
| ASIC | any Q4 | shift-add | 0 MB | Mass production |
Pareto Frontier
Size (MB)
^
9.8 | ● slim_96d (INT8+Q4)
| ● slim_96d full
8 |
7 |
6 |
5 |
4 | ● hybrid_ALAL (3.9 MB)
|
3 |
2 |
1 |
+----------------------------------------> Quality (cos vs f32)
0.85 0.90 0.95 0.99 1.00