Pacific-Prime commited on
Commit
970f6b8
Β·
verified Β·
1 Parent(s): 82677d6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +174 -5
README.md CHANGED
@@ -1,10 +1,179 @@
1
  ---
2
- title: README
3
  emoji: 🐒
4
- colorFrom: blue
5
- colorTo: gray
6
  sdk: static
7
- pinned: false
 
 
8
  ---
9
 
10
- Edit this `README.md` markdown file to author your organization card.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Complexity Deep
3
  emoji: 🐒
4
+ colorFrom: purple
5
+ colorTo: blue
6
  sdk: static
7
+ pinned: true
8
+ thumbnail: >-
9
+ https://cdn-uploads.huggingface.co/production/uploads/643222d9f76c34519e96a299/8j1GHX24MV3-sv-4zl7ZB.png
10
  ---
11
 
12
+ # Complexity Deep
13
+
14
+ **Next-generation LLM architecture with INL Dynamics and Token-Routed MLP**
15
+
16
+ ## What is Complexity Deep?
17
+
18
+ Complexity Deep is a novel transformer architecture designed for **stability** and **efficiency**. It combines:
19
+
20
+ - **INL Dynamics** - Robotics-grade control system for training stability
21
+ - **Token-Routed MLP** - Deterministic MoE without routing overhead
22
+ - **GQA (Grouped Query Attention)** - 4x faster inference, 4x smaller KV cache
23
+ - **QK Norm** - Attention stability for deep models
24
+
25
+ ## Key Innovation: INL Dynamics
26
+
27
+ INL (Inertial Navigation Layer) Dynamics brings robotics control theory to LLM training:
28
+
29
+ ```
30
+ Standard Transformer: hidden β†’ LayerNorm β†’ Attention β†’ MLP β†’ output
31
+ (can diverge on bad data)
32
+
33
+ Complexity Deep: hidden β†’ INL Controller β†’ Attention β†’ MLP β†’ output
34
+ (self-stabilizing, recovers from spikes)
35
+ ```
36
+
37
+ **Real-world proof**: Our 150M model survived a loss spike of **4000x** and auto-recovered in 45 minutes without any intervention.
38
+
39
+ ## Token-Routed MLP
40
+
41
+ Unlike learned MoE (Mixtral, etc.), Token-Routed MLP routes by token ID:
42
+
43
+ | Aspect | Learned MoE | Token-Routed (Ours) |
44
+ |--------|-------------|---------------------|
45
+ | Routing | Neural network | `token_id % num_experts` |
46
+ | Latency | 5-10ms | **<0.1ms** |
47
+ | Deterministic | No | **Yes** |
48
+ | Load balancing needed | Yes | **No** |
49
+
50
+ **Why it works**: BPE tokenizers sort by frequency. Token ID = frequency category = natural expert specialization.
51
+
52
+ ## Models
53
+
54
+ | Model | Params | Status | Link |
55
+ |-------|--------|--------|------|
56
+ | pacific-prime | 150M | Training (100K+ steps) | [HuggingFace](https://huggingface.co/Pacific-Prime/pacific-prime) |
57
+ | complexity-tiny | 15M | Available | [HuggingFace](https://huggingface.co/Pacific-Prime/complexity-tiny) |
58
+
59
+ ## Installation
60
+
61
+ ```bash
62
+ pip install complexity-deep
63
+ ```
64
+
65
+ ## Quick Start
66
+
67
+ ```python
68
+ from complexity_deep import DeepConfig, DeepForCausalLM, create_deep_model
69
+
70
+ # Create a model
71
+ model = create_deep_model(size="tiny", vocab_size=100000)
72
+
73
+ # Or use presets
74
+ config = DeepConfig.complexity_150m() # 150M params
75
+ config = DeepConfig.complexity_3_8b() # 3.8B params
76
+ config = DeepConfig.complexity_7b() # 7B params
77
+ ```
78
+
79
+ ## Architecture Comparison
80
+
81
+ | Feature | LLaMA | Mistral | Complexity Deep |
82
+ |---------|-------|---------|-----------------|
83
+ | Attention | GQA | GQA + Sliding | GQA + QK Norm |
84
+ | MLP | Dense | MoE (learned) | Token-Routed MoE |
85
+ | Stability | Gradient clip | Gradient clip | **INL Dynamics** |
86
+ | Recovery from spike | Manual rollback | Manual rollback | **Auto-recovery** |
87
+
88
+ ## Training Stability Demo
89
+
90
+ **Real training run - Loss spike of 4000x with auto-recovery:**
91
+
92
+ ![INL Dynamics Recovery](https://cdn-uploads.huggingface.co/production/uploads/643222d9f76c34519e96a299/8j1GHX24MV3-sv-4zl7ZB.png)
93
+
94
+ ```
95
+ Loss during training with bad batch:
96
+
97
+ Standard: 5.6 β†’ 4000 β†’ NaN β†’ DEAD
98
+ Complexity: 5.6 β†’ 4000 β†’ 46 β†’ 16 β†’ 8 β†’ 5.6 (auto-recovered!)
99
+ ```
100
+
101
+ The spike visible in the graph shows INL Dynamics absorbing a corrupted batch from FineWeb-Edu and automatically recovering without any manual intervention.
102
+
103
+ ## Available Configurations
104
+
105
+ ```python
106
+ # Small models (for testing)
107
+ DeepConfig.complexity_tiny() # ~15M
108
+ DeepConfig.complexity_20m() # ~20M
109
+ DeepConfig.complexity_small() # ~50M
110
+
111
+ # Medium models
112
+ DeepConfig.complexity_150m() # ~150M (default)
113
+ DeepConfig.complexity_base() # ~125M
114
+ DeepConfig.complexity_medium() # ~350M
115
+
116
+ # Large models
117
+ DeepConfig.complexity_1b() # ~1B
118
+ DeepConfig.complexity_3b() # ~3B
119
+ DeepConfig.complexity_3_8b() # ~3.8B
120
+ DeepConfig.complexity_7b() # ~7B
121
+ ```
122
+
123
+ ## INL Dynamics Parameters
124
+
125
+ ```python
126
+ config = DeepConfig(
127
+ dynamics_alpha=0.9, # Inertia (momentum)
128
+ dynamics_beta=0.1, # Correction strength
129
+ dynamics_gate=0.5, # Amplitude control
130
+ dynamics_dt=0.1, # Integration timestep
131
+ )
132
+ ```
133
+
134
+ ## Use Cases
135
+
136
+ ### 1. Training on Noisy Data
137
+ INL Dynamics absorbs bad batches without killing your training run.
138
+
139
+ ### 2. Budget-Constrained Training
140
+ No need for expensive rollbacks - the model self-heals.
141
+
142
+ ### 3. Robotics Applications
143
+ Deterministic Token-Routed MLP = predictable, certifiable behavior.
144
+
145
+ ### 4. Edge Deployment
146
+ GQA + Token-Routed = fast inference with small KV cache.
147
+
148
+ ## Research
149
+
150
+ Complexity Deep introduces two novel concepts:
151
+
152
+ 1. **INL Dynamics**: First application of robotics control theory (PID-like) to transformer hidden states for training stability.
153
+
154
+ 2. **Deterministic Token-Routed MoE**: First MoE that routes by token ID instead of learned routing, leveraging BPE frequency ordering.
155
+
156
+ ## Links
157
+
158
+ - [PyPI Package](https://pypi.org/project/complexity-deep/)
159
+ - [GitHub](https://github.com/Web3-League/complexity-deep)
160
+ - [Pacific-Prime Organization](https://huggingface.co/Pacific-Prime)
161
+
162
+ ## License
163
+
164
+ CC-BY-4.0
165
+
166
+ ## Citation
167
+
168
+ ```bibtex
169
+ @software{complexity_deep_2024,
170
+ title={Complexity Deep: INL Dynamics and Token-Routed MLP for Stable LLM Training},
171
+ author={Pacific Prime},
172
+ year={2024},
173
+ url={https://huggingface.co/Pacific-Prime}
174
+ }
175
+ ```
176
+
177
+ ---
178
+
179
+ **Built with stability in mind. Train with confidence.**