Initial release: 8.7M reference + 120.9M variant checkpoints + model card

e13538f verified 1 day ago

4.78 kB

	---
	license: mit
	library_name: pytorch
	tags:
	- efficient-inference
	- quantization
	- state-space-model
	- sparse-attention
	- episodic-memory
	- long-context
	- edge-deployment
	language:
	- en
	---

	# SynapNet-Edge — Checkpoints

	Hybrid SSM + sparse-attention + episodic-memory architecture with Component-Aware Joint Quantization (CAJQ) and Budget-Aware Episodic Eviction (BAEE), designed for long-context inference on consumer hardware.

	📦 Code: https://github.com/vineetha00/SynapNet-Edge
	🧪 Base architecture: https://github.com/vineetha00/SynapNet_Exp · 🤗 https://huggingface.co/Vineetha00/synapnet
	📄 Paper: arXiv preprint — link coming soon

	---

	## Checkpoints in this repo

	\| File \| Params \| Size \| Stage \| Eval NIAH-single (ctx=1024) \|
	\|---\|---\|---\|---\|---\|
	\| [`synapnet_edge_8m7.pt`](synapnet_edge_8m7.pt) \| 8.7M \| 33 MB \| Full 2-stage curriculum pretrain (ctx 512 → 1024) \| 0.618 ± 0.107 (FP16, 3 seeds) \|
	\| [`synapnet_edge_130m.pt`](synapnet_edge_130m.pt) \| 120.9M \| 461 MB \| 1,000-step pretrain, under-converged at this compute budget \| not converged — released for deployment profiling only \|

	### Architecture (8.7M reference)

	- `dim=192`, `depth=6`, `heads=6`, `episodic_slots=32`
	- `vocab_size=4096`, `num_classes=64`, `max_len=8192`
	- `k_frac=0.25` (sparse-attention top-K), `episodic_write_frac=0.05`
	- ScaleBridge enabled (FP16 interface between mixed-precision pathways)

	### Architecture (130M variant)

	- `dim=640`, `depth=10`, `heads=10`, `episodic_slots=32`
	- Same vocab, classes, max_len as 8.7M
	- Under-trained: 1,000 steps × batch 2 was insufficient for convergence at this scale. Use for latency / storage / memory profiling, not accuracy claims.

	---

	## Loading

	```python
	import torch
	from huggingface_hub import hf_hub_download
	from synapnet_edge.models.synapnet_edge_model import SynapNetEdge, SynapNetEdgeConfig

	ckpt_path = hf_hub_download(
	repo_id="Vineetha00/synapnet-edge",
	filename="synapnet_edge_8m7.pt",
	)
	ckpt = torch.load(ckpt_path, map_location="cpu")

	cfg = SynapNetEdgeConfig(**ckpt["model_cfg"])
	model = SynapNetEdge(cfg)
	model.load_state_dict(ckpt["model_state"])
	model.eval()
	```

	To install the architecture code:
	```bash
	pip install git+https://github.com/vineetha00/SynapNet-Edge.git
	```

	---

	## Training data

	Synthetic long-context curriculum (no external downloads):

	- NIAH-single (needle-in-a-haystack)
	- NIAH-multi-key (4 keys, retrieve value by queried key)
	- Variable tracking (3-hop chain)
	- Frequency aggregation (most-common class over 16 marked items)

	Two-stage curriculum: ctx=512 (4 epochs equivalent) → ctx=1024 (2 epochs equivalent).

	Final post-pretrain per-task accuracy (8.7M, ctx=1024):
	- NIAH-single: 57%
	- NIAH-multi-key: 13%
	- Variable tracking: 74%
	- Frequency aggregation: 47%

	(Versus 1.5% random-chance floor for 64-class.)

	---

	## Quantization (apply after loading FP16)

	The architecture supports Component-Aware Joint Quantization (CAJQ) at inference time:

	```python
	from synapnet_edge.quantization.cajq import apply_cajq, CAJQConfig
	from synapnet_edge.training.calibration import build_calib_loader

	calib_loader = build_calib_loader(n_samples=128, seq_len=1024)
	model = apply_cajq(
	model,
	CAJQConfig(device="mps"),
	calib_loader=calib_loader,
	mode="ptq", # or "qat" for QAT fine-tune
	)
	```

	After 3 seeds × 200 QAT steps, CAJQ matches or exceeds FP16 on NIAH-single at every evaluated context length:

	\| Variant \| Eff. bits \| ctx 1024 \| ctx 2048 \| ctx 4096 \|
	\|---\|---\|---\|---\|---\|
	\| FP16 \| 16.0 \| 0.618 ± 0.107 \| 0.507 ± 0.115 \| 0.438 ± 0.036 \|
	\| CAJQ-QAT (ours) \| 13.8 \| 0.674 ± 0.012 \| 0.590 ± 0.043 \| 0.521 ± 0.055 \|

	Compression: 4.4× on targeted SSM + attention parameters (0.60 MB vs 2.66 MB FP16-equivalent); 1.13× whole-model storage reduction at this configuration.

	---

	## Streaming inference with BAEE

	```python
	from synapnet_edge import BAEEMemoryManager

	manager = BAEEMemoryManager(dim=192, n_layers=6, budget_mb=256.0)
	logits, debug = model.forward_streaming(
	input_ids, chunk_size=512, baee_manager=manager,
	)
	```

	Under 90% forced eviction with the target needle in the early portion of an 8K stream, BAEE retains the target 71% ± 8% of the time vs 0% for FIFO / LRU. Head-to-head vs H2O / Scissorhands / SnapKV / PyramidKV / Locret-style policies in the GitHub repo.

	---

	## License

	MIT — see [LICENSE](https://github.com/vineetha00/SynapNet-Edge/blob/main/LICENSE).

	## Citation

	```bibtex
	@article{synapnet_edge_2026,
	title={SynapNet-Edge: Component-Aware Quantization and Budget-Aware Eviction for Hybrid Long-Context Models on Consumer Hardware},
	author={Vallish Kumar, Vineetha},
	year={2026},
	}
	```