| --- |
| license: mit |
| library_name: pytorch |
| tags: |
| - efficient-inference |
| - quantization |
| - state-space-model |
| - sparse-attention |
| - episodic-memory |
| - long-context |
| - edge-deployment |
| language: |
| - en |
| --- |
| |
| # SynapNet-Edge — Checkpoints |
|
|
| Hybrid **SSM + sparse-attention + episodic-memory** architecture with Component-Aware Joint Quantization (CAJQ) and Budget-Aware Episodic Eviction (BAEE), designed for long-context inference on consumer hardware. |
|
|
| 📦 **Code:** https://github.com/vineetha00/SynapNet-Edge |
| 🧪 **Base architecture:** https://github.com/vineetha00/SynapNet_Exp · 🤗 https://huggingface.co/Vineetha00/synapnet |
| 📄 **Paper:** arXiv preprint — link coming soon |
| |
| --- |
| |
| ## Checkpoints in this repo |
| |
| | File | Params | Size | Stage | Eval NIAH-single (ctx=1024) | |
| |---|---|---|---|---| |
| | [`synapnet_edge_8m7.pt`](synapnet_edge_8m7.pt) | **8.7M** | 33 MB | Full 2-stage curriculum pretrain (ctx 512 → 1024) | **0.618 ± 0.107** (FP16, 3 seeds) | |
| | [`synapnet_edge_130m.pt`](synapnet_edge_130m.pt) | **120.9M** | 461 MB | 1,000-step pretrain, under-converged at this compute budget | not converged — released for deployment profiling only | |
| |
| ### Architecture (8.7M reference) |
| |
| - `dim=192`, `depth=6`, `heads=6`, `episodic_slots=32` |
| - `vocab_size=4096`, `num_classes=64`, `max_len=8192` |
| - `k_frac=0.25` (sparse-attention top-K), `episodic_write_frac=0.05` |
| - ScaleBridge enabled (FP16 interface between mixed-precision pathways) |
|
|
| ### Architecture (130M variant) |
|
|
| - `dim=640`, `depth=10`, `heads=10`, `episodic_slots=32` |
| - Same vocab, classes, max_len as 8.7M |
| - **Under-trained**: 1,000 steps × batch 2 was insufficient for convergence at this scale. Use for latency / storage / memory profiling, not accuracy claims. |
| |
| --- |
| |
| ## Loading |
| |
| ```python |
| import torch |
| from huggingface_hub import hf_hub_download |
| from synapnet_edge.models.synapnet_edge_model import SynapNetEdge, SynapNetEdgeConfig |
| |
| ckpt_path = hf_hub_download( |
| repo_id="Vineetha00/synapnet-edge", |
| filename="synapnet_edge_8m7.pt", |
| ) |
| ckpt = torch.load(ckpt_path, map_location="cpu") |
| |
| cfg = SynapNetEdgeConfig(**ckpt["model_cfg"]) |
| model = SynapNetEdge(cfg) |
| model.load_state_dict(ckpt["model_state"]) |
| model.eval() |
| ``` |
| |
| To install the architecture code: |
| ```bash |
| pip install git+https://github.com/vineetha00/SynapNet-Edge.git |
| ``` |
| |
| --- |
| |
| ## Training data |
| |
| Synthetic long-context curriculum (no external downloads): |
| |
| - **NIAH-single** (needle-in-a-haystack) |
| - **NIAH-multi-key** (4 keys, retrieve value by queried key) |
| - **Variable tracking** (3-hop chain) |
| - **Frequency aggregation** (most-common class over 16 marked items) |
|
|
| Two-stage curriculum: ctx=512 (4 epochs equivalent) → ctx=1024 (2 epochs equivalent). |
|
|
| Final post-pretrain per-task accuracy (8.7M, ctx=1024): |
| - NIAH-single: 57% |
| - NIAH-multi-key: 13% |
| - Variable tracking: 74% |
| - Frequency aggregation: 47% |
|
|
| (Versus 1.5% random-chance floor for 64-class.) |
|
|
| --- |
|
|
| ## Quantization (apply after loading FP16) |
|
|
| The architecture supports **Component-Aware Joint Quantization (CAJQ)** at inference time: |
|
|
| ```python |
| from synapnet_edge.quantization.cajq import apply_cajq, CAJQConfig |
| from synapnet_edge.training.calibration import build_calib_loader |
| |
| calib_loader = build_calib_loader(n_samples=128, seq_len=1024) |
| model = apply_cajq( |
| model, |
| CAJQConfig(device="mps"), |
| calib_loader=calib_loader, |
| mode="ptq", # or "qat" for QAT fine-tune |
| ) |
| ``` |
|
|
| After **3 seeds × 200 QAT steps**, CAJQ matches or exceeds FP16 on NIAH-single at every evaluated context length: |
|
|
| | Variant | Eff. bits | ctx 1024 | ctx 2048 | ctx 4096 | |
| |---|---|---|---|---| |
| | FP16 | 16.0 | 0.618 ± 0.107 | 0.507 ± 0.115 | 0.438 ± 0.036 | |
| | **CAJQ-QAT (ours)** | 13.8 | **0.674 ± 0.012** | **0.590 ± 0.043** | **0.521 ± 0.055** | |
|
|
| Compression: 4.4× on targeted SSM + attention parameters (0.60 MB vs 2.66 MB FP16-equivalent); 1.13× whole-model storage reduction at this configuration. |
|
|
| --- |
|
|
| ## Streaming inference with BAEE |
|
|
| ```python |
| from synapnet_edge import BAEEMemoryManager |
| |
| manager = BAEEMemoryManager(dim=192, n_layers=6, budget_mb=256.0) |
| logits, debug = model.forward_streaming( |
| input_ids, chunk_size=512, baee_manager=manager, |
| ) |
| ``` |
|
|
| Under 90% forced eviction with the target needle in the *early* portion of an 8K stream, BAEE retains the target **71% ± 8%** of the time vs **0%** for FIFO / LRU. Head-to-head vs H2O / Scissorhands / SnapKV / PyramidKV / Locret-style policies in the GitHub repo. |
|
|
| --- |
|
|
| ## License |
|
|
| MIT — see [LICENSE](https://github.com/vineetha00/SynapNet-Edge/blob/main/LICENSE). |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{synapnet_edge_2026, |
| title={SynapNet-Edge: Component-Aware Quantization and Budget-Aware Eviction for Hybrid Long-Context Models on Consumer Hardware}, |
| author={Vallish Kumar, Vineetha}, |
| year={2026}, |
| } |
| ``` |
|
|