djtony707 commited on
Commit
1436c21
Β·
verified Β·
1 Parent(s): d5c6599

Update model card with full architecture details

Browse files
Files changed (1) hide show
  1. README.md +169 -0
README.md ADDED
@@ -0,0 +1,169 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ tags:
6
+ - titan-synapse
7
+ - specialist-swarm
8
+ - continuous-learning
9
+ - merged-model
10
+ - mamba
11
+ - xlstm
12
+ - mixture-of-experts
13
+ - fast-weights
14
+ - brain-inspired
15
+ - rust
16
+ - local-inference
17
+ base_model: Qwen/Qwen2.5-3B-Instruct
18
+ model_type: qwen2
19
+ pipeline_tag: text-generation
20
+ datasets:
21
+ - gsm8k
22
+ - openwebmath
23
+ - microsoft/orca-math-word-problems-200k
24
+ - sahil2801/CodeAlpaca-20k
25
+ - nickrosh/Evol-Instruct-Code-80k-v1
26
+ - iamtarun/python_code_instructions_18k_alpaca
27
+ - Open-Orca/SlimOrca
28
+ - yahma/alpaca-cleaned
29
+ ---
30
+
31
+ # Synapse-3B
32
+
33
+ **Small models that think together. And learn.**
34
+
35
+ Synapse-3B is a merged specialist model created by [TITAN Synapse](https://github.com/Djtony707/titan-synapse) β€” an open-source Rust inference engine that runs a swarm of tiny specialist models that collaborate and learn continuously on your GPU.
36
+
37
+ This model combines **4 specialist LoRA adapters** (math, code, general, coordinator) trained on curated datasets, then merged into a single model using **TIES merging** (Trim, Elect Sign, Merge) for minimal interference between specializations.
38
+
39
+ ## Key Features
40
+
41
+ - **4 specialist domains** merged into one model without catastrophic forgetting
42
+ - **TIES merging** β€” trims small deltas, elects signs by majority vote, merges only agreeing directions
43
+ - **Based on Qwen2.5-3B-Instruct** β€” strong Apache 2.0 base with multilingual support
44
+ - **Part of the Synapse ecosystem** β€” designed for the brain-inspired Synapse Architecture (Mamba + xLSTM + Sparse MoE + Fast Weights)
45
+
46
+ ## How This Model Was Made
47
+
48
+ ```
49
+ Base Model: Qwen/Qwen2.5-3B-Instruct (Apache 2.0)
50
+ |
51
+ +---> QLoRA (rank 64) ---> Math Specialist (GSM8K + OpenWebMath + Orca-Math, 50k samples)
52
+ +---> QLoRA (rank 64) ---> Code Specialist (CodeAlpaca + Evol-Instruct + Python-18k, 50k samples)
53
+ +---> QLoRA (rank 64) ---> General Specialist (SlimOrca + Alpaca-Cleaned, 50k samples)
54
+ +---> QLoRA (rank 32) ---> Coordinator (Synthetic routing, 5k samples)
55
+ |
56
+ +---> TIES Merge (trim 80%, sign election, agreement merge)
57
+ |
58
+ = Synapse-3B
59
+ ```
60
+
61
+ ### Specialist Details
62
+
63
+ | Specialist | Datasets | Samples | LoRA Rank | Focus |
64
+ |:---|:---|:---:|:---:|:---|
65
+ | **Math** | GSM8K, OpenWebMath, Orca-Math | 50,000 | 64 | Mathematical reasoning, step-by-step problem solving |
66
+ | **Code** | CodeAlpaca-20k, Evol-Instruct-Code-80k, Python-18k | 50,000 | 64 | Code generation, debugging, Python expertise |
67
+ | **General** | SlimOrca, Alpaca-Cleaned | 50,000 | 64 | General knowledge, instruction following, reasoning |
68
+ | **Coordinator** | Synthetic routing examples | 5,000 | 32 | Task analysis, specialist routing, swarm coordination |
69
+
70
+ ### Merge Method: TIES
71
+
72
+ [TIES (Trim, Elect Sign, Merge)](https://arxiv.org/abs/2306.01708) is used to combine adapters with minimal interference:
73
+
74
+ 1. **Trim** β€” Remove small-magnitude deltas (keep top 20% per parameter)
75
+ 2. **Elect Sign** β€” For each parameter, take a majority vote on the sign direction across all specialists
76
+ 3. **Merge** β€” Only average deltas that agree with the elected sign
77
+
78
+ This produces cleaner merges than simple averaging, preserving each specialist's strengths.
79
+
80
+ ## Usage
81
+
82
+ ### With Transformers
83
+
84
+ ```python
85
+ from transformers import AutoModelForCausalLM, AutoTokenizer
86
+
87
+ model = AutoModelForCausalLM.from_pretrained("djtony707/synapse-3b")
88
+ tokenizer = AutoTokenizer.from_pretrained("djtony707/synapse-3b")
89
+
90
+ messages = [{"role": "user", "content": "Solve: If a train travels 120km in 2 hours, what is its speed in m/s?"}]
91
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
92
+ inputs = tokenizer(text, return_tensors="pt").to(model.device)
93
+ outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
94
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
95
+ ```
96
+
97
+ ### With TITAN Synapse Engine (Rust, local inference)
98
+
99
+ ```bash
100
+ # Install
101
+ curl -sSL https://raw.githubusercontent.com/Djtony707/titan-synapse/main/install.sh | bash
102
+
103
+ # Pull and run
104
+ synapse pull synapse-3b
105
+ synapse up
106
+
107
+ # OpenAI-compatible API on localhost:6900
108
+ curl http://localhost:6900/v1/chat/completions \
109
+ -d '{"model":"synapse-3b","messages":[{"role":"user","content":"Hello!"}]}'
110
+ ```
111
+
112
+ ## The Synapse Architecture (v1.0 Target)
113
+
114
+ Synapse-3B is the foundation for the **Synapse Architecture** β€” a brain-inspired modular model that replaces monolithic transformers:
115
+
116
+ ```
117
+ THALAMUS (Mamba Router, O(n))
118
+ |
119
+ +--------------+--------------+
120
+ | | |
121
+ xLSTM Lang Sparse MoE Fast-Weight
122
+ Module Expert Pool Memory
123
+ O(n) top-k of 8+ Learn during
124
+ syntax, specialists inference,
125
+ grammar activate no backprop
126
+ ```
127
+
128
+ - **No O(n^2) attention** β€” Mamba (state-space) + xLSTM (recurrent)
129
+ - **Sparse activation** β€” only 2-3 of 8+ modules fire per token
130
+ - **Fast-weight memory** β€” learn new facts in ONE forward pass
131
+ - **Full observability** β€” every routing decision is transparent, no black box
132
+
133
+ ## Training Details
134
+
135
+ - **Hardware**: NVIDIA RTX 5090 (32GB VRAM)
136
+ - **Training framework**: QLoRA via TRL SFTTrainer
137
+ - **Quantization**: 4-bit NF4 (for training efficiency)
138
+ - **Learning rate**: 2e-4 with cosine scheduler
139
+ - **Epochs**: 3 per specialist
140
+ - **Batch size**: 2 (gradient accumulation 8, effective batch 16)
141
+ - **Max sequence length**: 2048 tokens
142
+ - **Training time**: ~2 hours per specialist on RTX 5090
143
+ - **Merge method**: TIES (trim ratio 0.8)
144
+ - **Created**: March 21, 2026
145
+
146
+ ## Limitations
147
+
148
+ - This is a 3B parameter model β€” it won't match 70B+ models on complex reasoning
149
+ - Trained on English-focused datasets; multilingual performance inherited from Qwen base
150
+ - The coordinator specialist is trained on synthetic routing data; real-world routing improves with use
151
+ - Best used as part of the TITAN Synapse swarm (multiple specialists collaborating)
152
+
153
+ ## Citation
154
+
155
+ ```bibtex
156
+ @misc{synapse3b2026,
157
+ title={Synapse-3B: A Merged Specialist Model for the TITAN Synapse Engine},
158
+ author={Tony Elliott},
159
+ year={2026},
160
+ url={https://huggingface.co/djtony707/synapse-3b},
161
+ note={Created with TITAN Synapse β€” https://github.com/Djtony707/titan-synapse}
162
+ }
163
+ ```
164
+
165
+ ## License
166
+
167
+ Apache 2.0 β€” use it for anything.
168
+
169
+ Built by [Tony Elliott](https://github.com/Djtony707) with [TITAN Synapse](https://github.com/Djtony707/titan-synapse).