Initial release: 8.7M reference + 120.9M variant checkpoints + model card
Browse files- README.md +150 -0
- synapnet_edge_130m.pt +3 -0
- synapnet_edge_8m7.pt +3 -0
README.md
ADDED
|
@@ -0,0 +1,150 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
library_name: pytorch
|
| 4 |
+
tags:
|
| 5 |
+
- efficient-inference
|
| 6 |
+
- quantization
|
| 7 |
+
- state-space-model
|
| 8 |
+
- sparse-attention
|
| 9 |
+
- episodic-memory
|
| 10 |
+
- long-context
|
| 11 |
+
- edge-deployment
|
| 12 |
+
language:
|
| 13 |
+
- en
|
| 14 |
+
---
|
| 15 |
+
|
| 16 |
+
# SynapNet-Edge — Checkpoints
|
| 17 |
+
|
| 18 |
+
Hybrid **SSM + sparse-attention + episodic-memory** architecture with Component-Aware Joint Quantization (CAJQ) and Budget-Aware Episodic Eviction (BAEE), designed for long-context inference on consumer hardware.
|
| 19 |
+
|
| 20 |
+
📦 **Code:** https://github.com/vineetha00/SynapNet-Edge
|
| 21 |
+
🧪 **Base architecture:** https://github.com/vineetha00/SynapNet_Exp · 🤗 https://huggingface.co/Vineetha00/synapnet
|
| 22 |
+
📄 **Paper:** arXiv preprint — link coming soon
|
| 23 |
+
|
| 24 |
+
---
|
| 25 |
+
|
| 26 |
+
## Checkpoints in this repo
|
| 27 |
+
|
| 28 |
+
| File | Params | Size | Stage | Eval NIAH-single (ctx=1024) |
|
| 29 |
+
|---|---|---|---|---|
|
| 30 |
+
| [`synapnet_edge_8m7.pt`](synapnet_edge_8m7.pt) | **8.7M** | 33 MB | Full 2-stage curriculum pretrain (ctx 512 → 1024) | **0.618 ± 0.107** (FP16, 3 seeds) |
|
| 31 |
+
| [`synapnet_edge_130m.pt`](synapnet_edge_130m.pt) | **120.9M** | 461 MB | 1,000-step pretrain, under-converged at this compute budget | not converged — released for deployment profiling only |
|
| 32 |
+
|
| 33 |
+
### Architecture (8.7M reference)
|
| 34 |
+
|
| 35 |
+
- `dim=192`, `depth=6`, `heads=6`, `episodic_slots=32`
|
| 36 |
+
- `vocab_size=4096`, `num_classes=64`, `max_len=8192`
|
| 37 |
+
- `k_frac=0.25` (sparse-attention top-K), `episodic_write_frac=0.05`
|
| 38 |
+
- ScaleBridge enabled (FP16 interface between mixed-precision pathways)
|
| 39 |
+
|
| 40 |
+
### Architecture (130M variant)
|
| 41 |
+
|
| 42 |
+
- `dim=640`, `depth=10`, `heads=10`, `episodic_slots=32`
|
| 43 |
+
- Same vocab, classes, max_len as 8.7M
|
| 44 |
+
- **Under-trained**: 1,000 steps × batch 2 was insufficient for convergence at this scale. Use for latency / storage / memory profiling, not accuracy claims.
|
| 45 |
+
|
| 46 |
+
---
|
| 47 |
+
|
| 48 |
+
## Loading
|
| 49 |
+
|
| 50 |
+
```python
|
| 51 |
+
import torch
|
| 52 |
+
from huggingface_hub import hf_hub_download
|
| 53 |
+
from synapnet_edge.models.synapnet_edge_model import SynapNetEdge, SynapNetEdgeConfig
|
| 54 |
+
|
| 55 |
+
ckpt_path = hf_hub_download(
|
| 56 |
+
repo_id="Vineetha00/synapnet-edge",
|
| 57 |
+
filename="synapnet_edge_8m7.pt",
|
| 58 |
+
)
|
| 59 |
+
ckpt = torch.load(ckpt_path, map_location="cpu")
|
| 60 |
+
|
| 61 |
+
cfg = SynapNetEdgeConfig(**ckpt["model_cfg"])
|
| 62 |
+
model = SynapNetEdge(cfg)
|
| 63 |
+
model.load_state_dict(ckpt["model_state"])
|
| 64 |
+
model.eval()
|
| 65 |
+
```
|
| 66 |
+
|
| 67 |
+
To install the architecture code:
|
| 68 |
+
```bash
|
| 69 |
+
pip install git+https://github.com/vineetha00/SynapNet-Edge.git
|
| 70 |
+
```
|
| 71 |
+
|
| 72 |
+
---
|
| 73 |
+
|
| 74 |
+
## Training data
|
| 75 |
+
|
| 76 |
+
Synthetic long-context curriculum (no external downloads):
|
| 77 |
+
|
| 78 |
+
- **NIAH-single** (needle-in-a-haystack)
|
| 79 |
+
- **NIAH-multi-key** (4 keys, retrieve value by queried key)
|
| 80 |
+
- **Variable tracking** (3-hop chain)
|
| 81 |
+
- **Frequency aggregation** (most-common class over 16 marked items)
|
| 82 |
+
|
| 83 |
+
Two-stage curriculum: ctx=512 (4 epochs equivalent) → ctx=1024 (2 epochs equivalent).
|
| 84 |
+
|
| 85 |
+
Final post-pretrain per-task accuracy (8.7M, ctx=1024):
|
| 86 |
+
- NIAH-single: 57%
|
| 87 |
+
- NIAH-multi-key: 13%
|
| 88 |
+
- Variable tracking: 74%
|
| 89 |
+
- Frequency aggregation: 47%
|
| 90 |
+
|
| 91 |
+
(Versus 1.5% random-chance floor for 64-class.)
|
| 92 |
+
|
| 93 |
+
---
|
| 94 |
+
|
| 95 |
+
## Quantization (apply after loading FP16)
|
| 96 |
+
|
| 97 |
+
The architecture supports **Component-Aware Joint Quantization (CAJQ)** at inference time:
|
| 98 |
+
|
| 99 |
+
```python
|
| 100 |
+
from synapnet_edge.quantization.cajq import apply_cajq, CAJQConfig
|
| 101 |
+
from synapnet_edge.training.calibration import build_calib_loader
|
| 102 |
+
|
| 103 |
+
calib_loader = build_calib_loader(n_samples=128, seq_len=1024)
|
| 104 |
+
model = apply_cajq(
|
| 105 |
+
model,
|
| 106 |
+
CAJQConfig(device="mps"),
|
| 107 |
+
calib_loader=calib_loader,
|
| 108 |
+
mode="ptq", # or "qat" for QAT fine-tune
|
| 109 |
+
)
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
After **3 seeds × 200 QAT steps**, CAJQ matches or exceeds FP16 on NIAH-single at every evaluated context length:
|
| 113 |
+
|
| 114 |
+
| Variant | Eff. bits | ctx 1024 | ctx 2048 | ctx 4096 |
|
| 115 |
+
|---|---|---|---|---|
|
| 116 |
+
| FP16 | 16.0 | 0.618 ± 0.107 | 0.507 ± 0.115 | 0.438 ± 0.036 |
|
| 117 |
+
| **CAJQ-QAT (ours)** | 13.8 | **0.674 ± 0.012** | **0.590 ± 0.043** | **0.521 ± 0.055** |
|
| 118 |
+
|
| 119 |
+
Compression: 4.4× on targeted SSM + attention parameters (0.60 MB vs 2.66 MB FP16-equivalent); 1.13× whole-model storage reduction at this configuration.
|
| 120 |
+
|
| 121 |
+
---
|
| 122 |
+
|
| 123 |
+
## Streaming inference with BAEE
|
| 124 |
+
|
| 125 |
+
```python
|
| 126 |
+
from synapnet_edge import BAEEMemoryManager
|
| 127 |
+
|
| 128 |
+
manager = BAEEMemoryManager(dim=192, n_layers=6, budget_mb=256.0)
|
| 129 |
+
logits, debug = model.forward_streaming(
|
| 130 |
+
input_ids, chunk_size=512, baee_manager=manager,
|
| 131 |
+
)
|
| 132 |
+
```
|
| 133 |
+
|
| 134 |
+
Under 90% forced eviction with the target needle in the *early* portion of an 8K stream, BAEE retains the target **71% ± 8%** of the time vs **0%** for FIFO / LRU. Head-to-head vs H2O / Scissorhands / SnapKV / PyramidKV / Locret-style policies in the GitHub repo.
|
| 135 |
+
|
| 136 |
+
---
|
| 137 |
+
|
| 138 |
+
## License
|
| 139 |
+
|
| 140 |
+
MIT — see [LICENSE](https://github.com/vineetha00/SynapNet-Edge/blob/main/LICENSE).
|
| 141 |
+
|
| 142 |
+
## Citation
|
| 143 |
+
|
| 144 |
+
```bibtex
|
| 145 |
+
@article{synapnet_edge_2026,
|
| 146 |
+
title={SynapNet-Edge: Component-Aware Quantization and Budget-Aware Eviction for Hybrid Long-Context Models on Consumer Hardware},
|
| 147 |
+
author={Vallish Kumar, Vineetha},
|
| 148 |
+
year={2026},
|
| 149 |
+
}
|
| 150 |
+
```
|
synapnet_edge_130m.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:6d542a9dd765cc85326ddea0f09d1ccc9e95f57be1e26605869467919b81bd83
|
| 3 |
+
size 483881523
|
synapnet_edge_8m7.pt
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f304b5aba18a930cb4c309ad6beb63a3bca057ebde67dfe1d6bd9818fede974a
|
| 3 |
+
size 34951283
|