# Model Comparison ## All Variants | Model | Latent | Enc | Pred | Params | f32 Size | Quantized | Cos | ESP32 predict | ESP32 encode | Best For | |-------|--------|-----|------|--------|----------|-----------|-----|---------------|--------------|----------| | **baseline** | 192 | 6 | 6 | ~14M | 54.6 MB | 10.9 MB (INT8+Q4) | 0.999 | 828 ms | ~10,000 ms | Quality reference | | **slim_48d** | 48 | 2 | 2 | ~3M | ~2 MB | ~1 MB | pending | ~300 ms* | ~3,000 ms* | Tiny edge | | **slim_64d** | 64 | 3 | 3 | ~5M | ~3 MB | ~2 MB | pending | ~400 ms* | ~4,000 ms* | Small edge | | **slim_96d** | 96 | 2 | 3 | ~8M | ~3.5 MB | ~2 MB | pending | ~400 ms* | ~4,000 ms* | Balanced | | **slim_96d** | 96 | 4 | 4 | ~10M | 36.8 MB | 9.8 MB (INT8+Q4) | 0.9982 | 583 ms | 6,416 ms | **Production** | | **slim_128d** | 128 | 4 | 4 | ~12M | ~5 MB | ~3 MB | pending | ~500 ms* | ~5,000 ms* | Quality bias | | **slim_192d** | 192 | 4 | 4 | ~13M | ~40 MB | ~12 MB | pending | ~600 ms* | ~6,000 ms* | Layer depth | | **hybrid_ALAL** | 64 | 4 | 4 | **3.0M** | ~12 MB | **3.9 MB** (LQ40) | pending | 152 ms | 922 ms* | **Max compression** | | **elastic** | 64 | 4 | 4 | ~10M | ~4 MB | ~2 MB | pending | ~400 ms* | ~4,000 ms* | Truncatable | | **baseline Q4 pred** | 192 | 6 | 6 | ~14M | 54.6 MB | 23.6 MB | 0.998 | 828 ms | ~10,000 ms | Large edge | | **WANDA 20%** | 192 | 6 | 6 | ~14M | 54.6 MB | 22.0 MB | ~0.99 | ~660 ms* | ~10,000 ms | Pruning research | | **WANDA 40%** | 192 | 6 | 6 | ~14M | 54.6 MB | 25.1 MB | ~0.97 | ~500 ms* | ~10,000 ms | Max sparsity | \* Projected — not yet benchmarked on hardware ## Quantization Formats | Format | Encoder | Predictor | Compression | Quality | Hardware | |--------|---------|-----------|-------------|---------|----------| | f32 | f32 | f32 | 1x | 1.000 | All | | INT8 | INT8 per-channel | f32 | ~2x | 0.9999 | ESP32, host | | INT8+Q4 | INT8 per-channel | Q4 block | ~5x | 0.999 | **Production** | | Q4 pred only | f32 | Q4 block | ~2x | 0.998 | Large edge | | Full Q4 | Q4 block | Q4 block | ~6x | 0.93 | Research | | Ternary | - | {-1,0,+1} | ~8x | ~0.85 | Experimental | | WANDA pruned | - | Q4 + sparse | ~0.8x | ~0.97 | Research | ## Hardware Tiers | Tier | Model | Format | Size | Use Case | |------|-------|--------|------|----------| | **ESP32-P4** | hybrid_ALAL, slim_96d | LQ40 | 3.9-9.8 MB | Edge robotics | | **Browser WASM** | slim_96d | LQ40 | 9.8 MB | Client-side demos | | **Host CPU** | any | safetensors | 2-54 MB | Development | | **FPGA** | baseline, slim | Q4 → hardwired | 0 MB (gates) | Custom silicon | | **ASIC** | any Q4 | shift-add | 0 MB | Mass production | ## Pareto Frontier ``` Size (MB) ^ 9.8 | ● slim_96d (INT8+Q4) | ● slim_96d full 8 | 7 | 6 | 5 | 4 | ● hybrid_ALAL (3.9 MB) | 3 | 2 | 1 | +----------------------------------------> Quality (cos vs f32) 0.85 0.90 0.95 0.99 1.00 ```