Update model card with full architecture details
Browse files
README.md
ADDED
|
@@ -0,0 +1,169 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
tags:
|
| 6 |
+
- titan-synapse
|
| 7 |
+
- specialist-swarm
|
| 8 |
+
- continuous-learning
|
| 9 |
+
- merged-model
|
| 10 |
+
- mamba
|
| 11 |
+
- xlstm
|
| 12 |
+
- mixture-of-experts
|
| 13 |
+
- fast-weights
|
| 14 |
+
- brain-inspired
|
| 15 |
+
- rust
|
| 16 |
+
- local-inference
|
| 17 |
+
base_model: Qwen/Qwen2.5-3B-Instruct
|
| 18 |
+
model_type: qwen2
|
| 19 |
+
pipeline_tag: text-generation
|
| 20 |
+
datasets:
|
| 21 |
+
- gsm8k
|
| 22 |
+
- openwebmath
|
| 23 |
+
- microsoft/orca-math-word-problems-200k
|
| 24 |
+
- sahil2801/CodeAlpaca-20k
|
| 25 |
+
- nickrosh/Evol-Instruct-Code-80k-v1
|
| 26 |
+
- iamtarun/python_code_instructions_18k_alpaca
|
| 27 |
+
- Open-Orca/SlimOrca
|
| 28 |
+
- yahma/alpaca-cleaned
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
# Synapse-3B
|
| 32 |
+
|
| 33 |
+
**Small models that think together. And learn.**
|
| 34 |
+
|
| 35 |
+
Synapse-3B is a merged specialist model created by [TITAN Synapse](https://github.com/Djtony707/titan-synapse) β an open-source Rust inference engine that runs a swarm of tiny specialist models that collaborate and learn continuously on your GPU.
|
| 36 |
+
|
| 37 |
+
This model combines **4 specialist LoRA adapters** (math, code, general, coordinator) trained on curated datasets, then merged into a single model using **TIES merging** (Trim, Elect Sign, Merge) for minimal interference between specializations.
|
| 38 |
+
|
| 39 |
+
## Key Features
|
| 40 |
+
|
| 41 |
+
- **4 specialist domains** merged into one model without catastrophic forgetting
|
| 42 |
+
- **TIES merging** β trims small deltas, elects signs by majority vote, merges only agreeing directions
|
| 43 |
+
- **Based on Qwen2.5-3B-Instruct** β strong Apache 2.0 base with multilingual support
|
| 44 |
+
- **Part of the Synapse ecosystem** β designed for the brain-inspired Synapse Architecture (Mamba + xLSTM + Sparse MoE + Fast Weights)
|
| 45 |
+
|
| 46 |
+
## How This Model Was Made
|
| 47 |
+
|
| 48 |
+
```
|
| 49 |
+
Base Model: Qwen/Qwen2.5-3B-Instruct (Apache 2.0)
|
| 50 |
+
|
|
| 51 |
+
+---> QLoRA (rank 64) ---> Math Specialist (GSM8K + OpenWebMath + Orca-Math, 50k samples)
|
| 52 |
+
+---> QLoRA (rank 64) ---> Code Specialist (CodeAlpaca + Evol-Instruct + Python-18k, 50k samples)
|
| 53 |
+
+---> QLoRA (rank 64) ---> General Specialist (SlimOrca + Alpaca-Cleaned, 50k samples)
|
| 54 |
+
+---> QLoRA (rank 32) ---> Coordinator (Synthetic routing, 5k samples)
|
| 55 |
+
|
|
| 56 |
+
+---> TIES Merge (trim 80%, sign election, agreement merge)
|
| 57 |
+
|
|
| 58 |
+
= Synapse-3B
|
| 59 |
+
```
|
| 60 |
+
|
| 61 |
+
### Specialist Details
|
| 62 |
+
|
| 63 |
+
| Specialist | Datasets | Samples | LoRA Rank | Focus |
|
| 64 |
+
|:---|:---|:---:|:---:|:---|
|
| 65 |
+
| **Math** | GSM8K, OpenWebMath, Orca-Math | 50,000 | 64 | Mathematical reasoning, step-by-step problem solving |
|
| 66 |
+
| **Code** | CodeAlpaca-20k, Evol-Instruct-Code-80k, Python-18k | 50,000 | 64 | Code generation, debugging, Python expertise |
|
| 67 |
+
| **General** | SlimOrca, Alpaca-Cleaned | 50,000 | 64 | General knowledge, instruction following, reasoning |
|
| 68 |
+
| **Coordinator** | Synthetic routing examples | 5,000 | 32 | Task analysis, specialist routing, swarm coordination |
|
| 69 |
+
|
| 70 |
+
### Merge Method: TIES
|
| 71 |
+
|
| 72 |
+
[TIES (Trim, Elect Sign, Merge)](https://arxiv.org/abs/2306.01708) is used to combine adapters with minimal interference:
|
| 73 |
+
|
| 74 |
+
1. **Trim** β Remove small-magnitude deltas (keep top 20% per parameter)
|
| 75 |
+
2. **Elect Sign** β For each parameter, take a majority vote on the sign direction across all specialists
|
| 76 |
+
3. **Merge** β Only average deltas that agree with the elected sign
|
| 77 |
+
|
| 78 |
+
This produces cleaner merges than simple averaging, preserving each specialist's strengths.
|
| 79 |
+
|
| 80 |
+
## Usage
|
| 81 |
+
|
| 82 |
+
### With Transformers
|
| 83 |
+
|
| 84 |
+
```python
|
| 85 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 86 |
+
|
| 87 |
+
model = AutoModelForCausalLM.from_pretrained("djtony707/synapse-3b")
|
| 88 |
+
tokenizer = AutoTokenizer.from_pretrained("djtony707/synapse-3b")
|
| 89 |
+
|
| 90 |
+
messages = [{"role": "user", "content": "Solve: If a train travels 120km in 2 hours, what is its speed in m/s?"}]
|
| 91 |
+
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
|
| 92 |
+
inputs = tokenizer(text, return_tensors="pt").to(model.device)
|
| 93 |
+
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
|
| 94 |
+
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
|
| 95 |
+
```
|
| 96 |
+
|
| 97 |
+
### With TITAN Synapse Engine (Rust, local inference)
|
| 98 |
+
|
| 99 |
+
```bash
|
| 100 |
+
# Install
|
| 101 |
+
curl -sSL https://raw.githubusercontent.com/Djtony707/titan-synapse/main/install.sh | bash
|
| 102 |
+
|
| 103 |
+
# Pull and run
|
| 104 |
+
synapse pull synapse-3b
|
| 105 |
+
synapse up
|
| 106 |
+
|
| 107 |
+
# OpenAI-compatible API on localhost:6900
|
| 108 |
+
curl http://localhost:6900/v1/chat/completions \
|
| 109 |
+
-d '{"model":"synapse-3b","messages":[{"role":"user","content":"Hello!"}]}'
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
## The Synapse Architecture (v1.0 Target)
|
| 113 |
+
|
| 114 |
+
Synapse-3B is the foundation for the **Synapse Architecture** β a brain-inspired modular model that replaces monolithic transformers:
|
| 115 |
+
|
| 116 |
+
```
|
| 117 |
+
THALAMUS (Mamba Router, O(n))
|
| 118 |
+
|
|
| 119 |
+
+--------------+--------------+
|
| 120 |
+
| | |
|
| 121 |
+
xLSTM Lang Sparse MoE Fast-Weight
|
| 122 |
+
Module Expert Pool Memory
|
| 123 |
+
O(n) top-k of 8+ Learn during
|
| 124 |
+
syntax, specialists inference,
|
| 125 |
+
grammar activate no backprop
|
| 126 |
+
```
|
| 127 |
+
|
| 128 |
+
- **No O(n^2) attention** β Mamba (state-space) + xLSTM (recurrent)
|
| 129 |
+
- **Sparse activation** β only 2-3 of 8+ modules fire per token
|
| 130 |
+
- **Fast-weight memory** β learn new facts in ONE forward pass
|
| 131 |
+
- **Full observability** β every routing decision is transparent, no black box
|
| 132 |
+
|
| 133 |
+
## Training Details
|
| 134 |
+
|
| 135 |
+
- **Hardware**: NVIDIA RTX 5090 (32GB VRAM)
|
| 136 |
+
- **Training framework**: QLoRA via TRL SFTTrainer
|
| 137 |
+
- **Quantization**: 4-bit NF4 (for training efficiency)
|
| 138 |
+
- **Learning rate**: 2e-4 with cosine scheduler
|
| 139 |
+
- **Epochs**: 3 per specialist
|
| 140 |
+
- **Batch size**: 2 (gradient accumulation 8, effective batch 16)
|
| 141 |
+
- **Max sequence length**: 2048 tokens
|
| 142 |
+
- **Training time**: ~2 hours per specialist on RTX 5090
|
| 143 |
+
- **Merge method**: TIES (trim ratio 0.8)
|
| 144 |
+
- **Created**: March 21, 2026
|
| 145 |
+
|
| 146 |
+
## Limitations
|
| 147 |
+
|
| 148 |
+
- This is a 3B parameter model β it won't match 70B+ models on complex reasoning
|
| 149 |
+
- Trained on English-focused datasets; multilingual performance inherited from Qwen base
|
| 150 |
+
- The coordinator specialist is trained on synthetic routing data; real-world routing improves with use
|
| 151 |
+
- Best used as part of the TITAN Synapse swarm (multiple specialists collaborating)
|
| 152 |
+
|
| 153 |
+
## Citation
|
| 154 |
+
|
| 155 |
+
```bibtex
|
| 156 |
+
@misc{synapse3b2026,
|
| 157 |
+
title={Synapse-3B: A Merged Specialist Model for the TITAN Synapse Engine},
|
| 158 |
+
author={Tony Elliott},
|
| 159 |
+
year={2026},
|
| 160 |
+
url={https://huggingface.co/djtony707/synapse-3b},
|
| 161 |
+
note={Created with TITAN Synapse β https://github.com/Djtony707/titan-synapse}
|
| 162 |
+
}
|
| 163 |
+
```
|
| 164 |
+
|
| 165 |
+
## License
|
| 166 |
+
|
| 167 |
+
Apache 2.0 β use it for anything.
|
| 168 |
+
|
| 169 |
+
Built by [Tony Elliott](https://github.com/Djtony707) with [TITAN Synapse](https://github.com/Djtony707/titan-synapse).
|