File size: 4,778 Bytes
e13538f | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 | ---
license: mit
library_name: pytorch
tags:
- efficient-inference
- quantization
- state-space-model
- sparse-attention
- episodic-memory
- long-context
- edge-deployment
language:
- en
---
# SynapNet-Edge — Checkpoints
Hybrid **SSM + sparse-attention + episodic-memory** architecture with Component-Aware Joint Quantization (CAJQ) and Budget-Aware Episodic Eviction (BAEE), designed for long-context inference on consumer hardware.
📦 **Code:** https://github.com/vineetha00/SynapNet-Edge
🧪 **Base architecture:** https://github.com/vineetha00/SynapNet_Exp · 🤗 https://huggingface.co/Vineetha00/synapnet
📄 **Paper:** arXiv preprint — link coming soon
---
## Checkpoints in this repo
| File | Params | Size | Stage | Eval NIAH-single (ctx=1024) |
|---|---|---|---|---|
| [`synapnet_edge_8m7.pt`](synapnet_edge_8m7.pt) | **8.7M** | 33 MB | Full 2-stage curriculum pretrain (ctx 512 → 1024) | **0.618 ± 0.107** (FP16, 3 seeds) |
| [`synapnet_edge_130m.pt`](synapnet_edge_130m.pt) | **120.9M** | 461 MB | 1,000-step pretrain, under-converged at this compute budget | not converged — released for deployment profiling only |
### Architecture (8.7M reference)
- `dim=192`, `depth=6`, `heads=6`, `episodic_slots=32`
- `vocab_size=4096`, `num_classes=64`, `max_len=8192`
- `k_frac=0.25` (sparse-attention top-K), `episodic_write_frac=0.05`
- ScaleBridge enabled (FP16 interface between mixed-precision pathways)
### Architecture (130M variant)
- `dim=640`, `depth=10`, `heads=10`, `episodic_slots=32`
- Same vocab, classes, max_len as 8.7M
- **Under-trained**: 1,000 steps × batch 2 was insufficient for convergence at this scale. Use for latency / storage / memory profiling, not accuracy claims.
---
## Loading
```python
import torch
from huggingface_hub import hf_hub_download
from synapnet_edge.models.synapnet_edge_model import SynapNetEdge, SynapNetEdgeConfig
ckpt_path = hf_hub_download(
repo_id="Vineetha00/synapnet-edge",
filename="synapnet_edge_8m7.pt",
)
ckpt = torch.load(ckpt_path, map_location="cpu")
cfg = SynapNetEdgeConfig(**ckpt["model_cfg"])
model = SynapNetEdge(cfg)
model.load_state_dict(ckpt["model_state"])
model.eval()
```
To install the architecture code:
```bash
pip install git+https://github.com/vineetha00/SynapNet-Edge.git
```
---
## Training data
Synthetic long-context curriculum (no external downloads):
- **NIAH-single** (needle-in-a-haystack)
- **NIAH-multi-key** (4 keys, retrieve value by queried key)
- **Variable tracking** (3-hop chain)
- **Frequency aggregation** (most-common class over 16 marked items)
Two-stage curriculum: ctx=512 (4 epochs equivalent) → ctx=1024 (2 epochs equivalent).
Final post-pretrain per-task accuracy (8.7M, ctx=1024):
- NIAH-single: 57%
- NIAH-multi-key: 13%
- Variable tracking: 74%
- Frequency aggregation: 47%
(Versus 1.5% random-chance floor for 64-class.)
---
## Quantization (apply after loading FP16)
The architecture supports **Component-Aware Joint Quantization (CAJQ)** at inference time:
```python
from synapnet_edge.quantization.cajq import apply_cajq, CAJQConfig
from synapnet_edge.training.calibration import build_calib_loader
calib_loader = build_calib_loader(n_samples=128, seq_len=1024)
model = apply_cajq(
model,
CAJQConfig(device="mps"),
calib_loader=calib_loader,
mode="ptq", # or "qat" for QAT fine-tune
)
```
After **3 seeds × 200 QAT steps**, CAJQ matches or exceeds FP16 on NIAH-single at every evaluated context length:
| Variant | Eff. bits | ctx 1024 | ctx 2048 | ctx 4096 |
|---|---|---|---|---|
| FP16 | 16.0 | 0.618 ± 0.107 | 0.507 ± 0.115 | 0.438 ± 0.036 |
| **CAJQ-QAT (ours)** | 13.8 | **0.674 ± 0.012** | **0.590 ± 0.043** | **0.521 ± 0.055** |
Compression: 4.4× on targeted SSM + attention parameters (0.60 MB vs 2.66 MB FP16-equivalent); 1.13× whole-model storage reduction at this configuration.
---
## Streaming inference with BAEE
```python
from synapnet_edge import BAEEMemoryManager
manager = BAEEMemoryManager(dim=192, n_layers=6, budget_mb=256.0)
logits, debug = model.forward_streaming(
input_ids, chunk_size=512, baee_manager=manager,
)
```
Under 90% forced eviction with the target needle in the *early* portion of an 8K stream, BAEE retains the target **71% ± 8%** of the time vs **0%** for FIFO / LRU. Head-to-head vs H2O / Scissorhands / SnapKV / PyramidKV / Locret-style policies in the GitHub repo.
---
## License
MIT — see [LICENSE](https://github.com/vineetha00/SynapNet-Edge/blob/main/LICENSE).
## Citation
```bibtex
@article{synapnet_edge_2026,
title={SynapNet-Edge: Component-Aware Quantization and Budget-Aware Eviction for Hybrid Long-Context Models on Consumer Hardware},
author={Vallish Kumar, Vineetha},
year={2026},
}
```
|