File size: 4,778 Bytes
e13538f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
---
license: mit
library_name: pytorch
tags:
  - efficient-inference
  - quantization
  - state-space-model
  - sparse-attention
  - episodic-memory
  - long-context
  - edge-deployment
language:
  - en
---

# SynapNet-Edge — Checkpoints

Hybrid **SSM + sparse-attention + episodic-memory** architecture with Component-Aware Joint Quantization (CAJQ) and Budget-Aware Episodic Eviction (BAEE), designed for long-context inference on consumer hardware.

📦 **Code:** https://github.com/vineetha00/SynapNet-Edge
🧪 **Base architecture:** https://github.com/vineetha00/SynapNet_Exp · 🤗 https://huggingface.co/Vineetha00/synapnet
📄 **Paper:** arXiv preprint — link coming soon

---

## Checkpoints in this repo

| File | Params | Size | Stage | Eval NIAH-single (ctx=1024) |
|---|---|---|---|---|
| [`synapnet_edge_8m7.pt`](synapnet_edge_8m7.pt) | **8.7M** | 33 MB | Full 2-stage curriculum pretrain (ctx 512 → 1024) | **0.618 ± 0.107** (FP16, 3 seeds) |
| [`synapnet_edge_130m.pt`](synapnet_edge_130m.pt) | **120.9M** | 461 MB | 1,000-step pretrain, under-converged at this compute budget | not converged — released for deployment profiling only |

### Architecture (8.7M reference)

- `dim=192`, `depth=6`, `heads=6`, `episodic_slots=32`
- `vocab_size=4096`, `num_classes=64`, `max_len=8192`
- `k_frac=0.25` (sparse-attention top-K), `episodic_write_frac=0.05`
- ScaleBridge enabled (FP16 interface between mixed-precision pathways)

### Architecture (130M variant)

- `dim=640`, `depth=10`, `heads=10`, `episodic_slots=32`
- Same vocab, classes, max_len as 8.7M
- **Under-trained**: 1,000 steps × batch 2 was insufficient for convergence at this scale. Use for latency / storage / memory profiling, not accuracy claims.

---

## Loading

```python
import torch
from huggingface_hub import hf_hub_download
from synapnet_edge.models.synapnet_edge_model import SynapNetEdge, SynapNetEdgeConfig

ckpt_path = hf_hub_download(
    repo_id="Vineetha00/synapnet-edge",
    filename="synapnet_edge_8m7.pt",
)
ckpt = torch.load(ckpt_path, map_location="cpu")

cfg = SynapNetEdgeConfig(**ckpt["model_cfg"])
model = SynapNetEdge(cfg)
model.load_state_dict(ckpt["model_state"])
model.eval()
```

To install the architecture code:
```bash
pip install git+https://github.com/vineetha00/SynapNet-Edge.git
```

---

## Training data

Synthetic long-context curriculum (no external downloads):

- **NIAH-single** (needle-in-a-haystack)
- **NIAH-multi-key** (4 keys, retrieve value by queried key)
- **Variable tracking** (3-hop chain)
- **Frequency aggregation** (most-common class over 16 marked items)

Two-stage curriculum: ctx=512 (4 epochs equivalent) → ctx=1024 (2 epochs equivalent).

Final post-pretrain per-task accuracy (8.7M, ctx=1024):
- NIAH-single: 57%
- NIAH-multi-key: 13%
- Variable tracking: 74%
- Frequency aggregation: 47%

(Versus 1.5% random-chance floor for 64-class.)

---

## Quantization (apply after loading FP16)

The architecture supports **Component-Aware Joint Quantization (CAJQ)** at inference time:

```python
from synapnet_edge.quantization.cajq import apply_cajq, CAJQConfig
from synapnet_edge.training.calibration import build_calib_loader

calib_loader = build_calib_loader(n_samples=128, seq_len=1024)
model = apply_cajq(
    model,
    CAJQConfig(device="mps"),
    calib_loader=calib_loader,
    mode="ptq",   # or "qat" for QAT fine-tune
)
```

After **3 seeds × 200 QAT steps**, CAJQ matches or exceeds FP16 on NIAH-single at every evaluated context length:

| Variant | Eff. bits | ctx 1024 | ctx 2048 | ctx 4096 |
|---|---|---|---|---|
| FP16 | 16.0 | 0.618 ± 0.107 | 0.507 ± 0.115 | 0.438 ± 0.036 |
| **CAJQ-QAT (ours)** | 13.8 | **0.674 ± 0.012** | **0.590 ± 0.043** | **0.521 ± 0.055** |

Compression: 4.4× on targeted SSM + attention parameters (0.60 MB vs 2.66 MB FP16-equivalent); 1.13× whole-model storage reduction at this configuration.

---

## Streaming inference with BAEE

```python
from synapnet_edge import BAEEMemoryManager

manager = BAEEMemoryManager(dim=192, n_layers=6, budget_mb=256.0)
logits, debug = model.forward_streaming(
    input_ids, chunk_size=512, baee_manager=manager,
)
```

Under 90% forced eviction with the target needle in the *early* portion of an 8K stream, BAEE retains the target **71% ± 8%** of the time vs **0%** for FIFO / LRU. Head-to-head vs H2O / Scissorhands / SnapKV / PyramidKV / Locret-style policies in the GitHub repo.

---

## License

MIT — see [LICENSE](https://github.com/vineetha00/SynapNet-Edge/blob/main/LICENSE).

## Citation

```bibtex
@article{synapnet_edge_2026,
  title={SynapNet-Edge: Component-Aware Quantization and Budget-Aware Eviction for Hybrid Long-Context Models on Consumer Hardware},
  author={Vallish Kumar, Vineetha},
  year={2026},
}
```