Initial release: 8.7M reference + 120.9M variant checkpoints + model card

Browse files

Files changed (3) hide show

README.md +150 -0
synapnet_edge_130m.pt +3 -0
synapnet_edge_8m7.pt +3 -0

README.md ADDED Viewed

	@@ -0,0 +1,150 @@

+---
+license: mit
+library_name: pytorch
+tags:
+  - efficient-inference
+  - quantization
+  - state-space-model
+  - sparse-attention
+  - episodic-memory
+  - long-context
+  - edge-deployment
+language:
+  - en
+---
+# SynapNet-Edge — Checkpoints
+Hybrid **SSM + sparse-attention + episodic-memory** architecture with Component-Aware Joint Quantization (CAJQ) and Budget-Aware Episodic Eviction (BAEE), designed for long-context inference on consumer hardware.
+📦 **Code:** https://github.com/vineetha00/SynapNet-Edge
+🧪 **Base architecture:** https://github.com/vineetha00/SynapNet_Exp · 🤗 https://huggingface.co/Vineetha00/synapnet
+📄 **Paper:** arXiv preprint — link coming soon
+---
+## Checkpoints in this repo
+| File | Params | Size | Stage | Eval NIAH-single (ctx=1024) |
+|---|---|---|---|---|
+| [`synapnet_edge_8m7.pt`](synapnet_edge_8m7.pt) | **8.7M** | 33 MB | Full 2-stage curriculum pretrain (ctx 512 → 1024) | **0.618 ± 0.107** (FP16, 3 seeds) |
+| [`synapnet_edge_130m.pt`](synapnet_edge_130m.pt) | **120.9M** | 461 MB | 1,000-step pretrain, under-converged at this compute budget | not converged — released for deployment profiling only |
+### Architecture (8.7M reference)
+- `dim=192`, `depth=6`, `heads=6`, `episodic_slots=32`
+- `vocab_size=4096`, `num_classes=64`, `max_len=8192`
+- `k_frac=0.25` (sparse-attention top-K), `episodic_write_frac=0.05`
+- ScaleBridge enabled (FP16 interface between mixed-precision pathways)
+### Architecture (130M variant)
+- `dim=640`, `depth=10`, `heads=10`, `episodic_slots=32`
+- Same vocab, classes, max_len as 8.7M
+- **Under-trained**: 1,000 steps × batch 2 was insufficient for convergence at this scale. Use for latency / storage / memory profiling, not accuracy claims.
+---
+## Loading
+```python
+import torch
+from huggingface_hub import hf_hub_download
+from synapnet_edge.models.synapnet_edge_model import SynapNetEdge, SynapNetEdgeConfig
+ckpt_path = hf_hub_download(
+    repo_id="Vineetha00/synapnet-edge",
+    filename="synapnet_edge_8m7.pt",
+)
+ckpt = torch.load(ckpt_path, map_location="cpu")
+cfg = SynapNetEdgeConfig(**ckpt["model_cfg"])
+model = SynapNetEdge(cfg)
+model.load_state_dict(ckpt["model_state"])
+model.eval()
+```
+To install the architecture code:
+```bash
+pip install git+https://github.com/vineetha00/SynapNet-Edge.git
+```
+---
+## Training data
+Synthetic long-context curriculum (no external downloads):
+- **NIAH-single** (needle-in-a-haystack)
+- **NIAH-multi-key** (4 keys, retrieve value by queried key)
+- **Variable tracking** (3-hop chain)
+- **Frequency aggregation** (most-common class over 16 marked items)
+Two-stage curriculum: ctx=512 (4 epochs equivalent) → ctx=1024 (2 epochs equivalent).
+Final post-pretrain per-task accuracy (8.7M, ctx=1024):
+- NIAH-single: 57%
+- NIAH-multi-key: 13%
+- Variable tracking: 74%
+- Frequency aggregation: 47%
+(Versus 1.5% random-chance floor for 64-class.)
+---
+## Quantization (apply after loading FP16)
+The architecture supports **Component-Aware Joint Quantization (CAJQ)** at inference time:
+```python
+from synapnet_edge.quantization.cajq import apply_cajq, CAJQConfig
+from synapnet_edge.training.calibration import build_calib_loader
+calib_loader = build_calib_loader(n_samples=128, seq_len=1024)
+model = apply_cajq(
+    model,
+    CAJQConfig(device="mps"),
+    calib_loader=calib_loader,
+    mode="ptq",   # or "qat" for QAT fine-tune
+)
+```
+After **3 seeds × 200 QAT steps**, CAJQ matches or exceeds FP16 on NIAH-single at every evaluated context length:
+| Variant | Eff. bits | ctx 1024 | ctx 2048 | ctx 4096 |
+|---|---|---|---|---|
+| FP16 | 16.0 | 0.618 ± 0.107 | 0.507 ± 0.115 | 0.438 ± 0.036 |
+| **CAJQ-QAT (ours)** | 13.8 | **0.674 ± 0.012** | **0.590 ± 0.043** | **0.521 ± 0.055** |
+Compression: 4.4× on targeted SSM + attention parameters (0.60 MB vs 2.66 MB FP16-equivalent); 1.13× whole-model storage reduction at this configuration.
+---
+## Streaming inference with BAEE
+```python
+from synapnet_edge import BAEEMemoryManager
+manager = BAEEMemoryManager(dim=192, n_layers=6, budget_mb=256.0)
+logits, debug = model.forward_streaming(
+    input_ids, chunk_size=512, baee_manager=manager,
+)
+```
+Under 90% forced eviction with the target needle in the *early* portion of an 8K stream, BAEE retains the target **71% ± 8%** of the time vs **0%** for FIFO / LRU. Head-to-head vs H2O / Scissorhands / SnapKV / PyramidKV / Locret-style policies in the GitHub repo.
+---
+## License
+MIT — see [LICENSE](https://github.com/vineetha00/SynapNet-Edge/blob/main/LICENSE).
+## Citation
+```bibtex
+@article{synapnet_edge_2026,
+  title={SynapNet-Edge: Component-Aware Quantization and Budget-Aware Eviction for Hybrid Long-Context Models on Consumer Hardware},
+  author={Vallish Kumar, Vineetha},
+  year={2026},
+}
+```

synapnet_edge_130m.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:6d542a9dd765cc85326ddea0f09d1ccc9e95f57be1e26605869467919b81bd83
+size 483881523

synapnet_edge_8m7.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f304b5aba18a930cb4c309ad6beb63a3bca057ebde67dfe1d6bd9818fede974a
+size 34951283