File size: 5,753 Bytes
fed3ca7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e52795
fed3ca7
3e52795
 
fed3ca7
 
 
 
 
 
 
 
 
 
 
3e52795
 
 
 
 
 
 
fed3ca7
 
 
 
3e52795
fed3ca7
 
 
3e52795
fed3ca7
3e52795
fed3ca7
3e52795
 
 
 
 
fed3ca7
3e52795
fed3ca7
3e52795
fed3ca7
3e52795
fed3ca7
3e52795
fed3ca7
 
3e52795
 
 
 
 
 
 
 
fed3ca7
3e52795
 
 
 
fed3ca7
 
 
 
3e52795
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fed3ca7
3e52795
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fed3ca7
3e52795
fed3ca7
3e52795
fed3ca7
 
 
3e52795
fed3ca7
 
3e52795
fed3ca7
3e52795
 
 
 
fed3ca7
3e52795
 
fed3ca7
3e52795
 
 
 
 
fed3ca7
 
3e52795
fed3ca7
3e52795
 
 
 
fed3ca7
 
 
 
3e52795
 
fed3ca7
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
---
language:
  - en
license: mit
library_name: lux
tags:
  - julia
  - lux
  - slm
  - philosophy
  - symbiogenesis
  - monarch-mixer
  - long-convolution
  - causal-conv
  - rmsnorm
  - swiglu
  - bpe
  - text-generation
  - attention-free
pipeline_tag: text-generation
datasets:
  - LisaMegaWatts/philosophy-corpus
model-index:
  - name: SymbioSLM
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: LisaMegaWatts/philosophy-corpus
          name: philosophy-corpus
        metrics:
          - type: perplexity
            value: 37.3
            name: Val PPL
            verified: false
          - type: loss
            value: 3.62
            name: Val Loss
            verified: false
---

# SymbioSLM

A **5.05M parameter** attention-free language model using the **Symbiogenesis** architecture — multi-organelle sequence mixing with learned per-channel gating. Trained on a philosophy corpus of 981 classical texts (~795M tokens).

## Architecture

Symbiogenesis replaces self-attention with three complementary "organelles" for sequence mixing, inspired by the biological theory of symbiogenesis (Margulis, 1967) — where complex organelles like mitochondria were once independent organisms that fused into eukaryotic cells.

Each of the 8 SymbioBlocks contains:

| Organelle | Function | Scale | Complexity |
|-----------|----------|-------|------------|
| **CausalDepthwiseConv1d** | Local n-gram pattern detection | Local (kernel=4) | O(n) |
| **Monarch Matrix** | Sub-quadratic global sequence mixing | Global | O(n√n) |
| **LongConv** | Dense causal convolution filtering | Global | O(n log n) |

An **OrganelleGate** (per-channel softmax) learns which organelle each embedding channel relies on, creating specialized "fused organisms" per block.

### No Positional Encoding

SymbioSLM requires **no explicit positional encoding** (no RoPE, no sinusoidal embeddings). The Monarch matrices and LongConv kernels implicitly learn position-dependent mixing patterns, while CausalConv captures local ordering through its convolutional structure.

### Model Specifications

| Parameter | Value |
|-----------|-------|
| Architecture | Symbiogenesis |
| Parameters | 5,052,672 (5.05M) |
| Embedding dim | 256 |
| Layers | 8 |
| Monarch heads | 1 per block |
| Conv kernel | 4 |
| FFN | SwiGLU (4x, 2/3 adjusted) |
| Normalization | RMSNorm (pre-norm) |
| Context length | 256 tokens |
| Vocab size | 2,000 (BPE) |
| Weight tying | Yes |
| Free energy reg | 0.001 |

### Parameter Breakdown

| Component | Params | % |
|-----------|--------|---|
| Token embedding | 512,000 | 10.1% |
| SymbioBlocks (8x) | 4,540,672 | 89.9% |
|    CausalConv | ~8K/block | |
|    Monarch | ~131K/block | |
|    LongConv | ~65K/block | |
|    OrganelleGate | ~769/block | |
|    SwiGLU FFN | ~350K/block | |
|    RMSNorm (2x) | ~512/block | |
| Final RMSNorm | 256 | <0.1% |

## Results

Trained for 12,305 steps on an NVIDIA RTX 3060 (12GB).

| Metric | Value |
|--------|-------|
| **Val Loss** | **3.62** |
| **Val PPL** | **37.3** |
| Training steps | 12,305 |
| Batch size | 32 |
| Precision | Float16 (AMP) |

### Comparison with Other 5M Julia SLMs

All models trained on the same philosophy corpus with identical tokenizer and training budget (12,305 steps):

| Model | Architecture | Params | Val Loss | Val PPL |
|-------|-------------|--------|----------|---------|
| [JuliaSLM](https://huggingface.co/LisaMegaWatts/JuliaSLM) | Transformer (RoPE) | 5.04M | **3.54** | **34.5** |
| **SymbioSLM** | **Symbiogenesis** | **5.05M** | **3.62** | **37.3** |
| [MonarchSLM](https://huggingface.co/LisaMegaWatts/MonarchSLM) | Monarch Mixer | 5.04M | 3.65 | 38.4 |

SymbioSLM outperforms the Monarch-only baseline while using no attention mechanism. The multi-organelle fusion provides complementary mixing at different scales that a single mixer cannot achieve alone.

## Training Configuration

```toml
[model]
arch = "symbiogenesis"
embed_dim = 256
n_layers = 8
n_monarch_heads = 1
conv_kernel_size = 4
ffn_mult = 4
context_length = 256
weight_tying = true
free_energy_beta = 0.001

[training]
optimizer = "adamw"
lr = 6e-4
min_lr = 6e-5
warmup_steps = 500
max_steps = 12305
batch_size = 32
grad_clip = 1.0
precision = "f16"
```

## Gelation Monitoring

Training includes gelation monitoring via CUSUM change-point detection on gate entropy. This tracks when the organelle gates transition from uniform mixing to specialized configurations — a phase transition analogous to gel formation in polymer physics.

## Usage

### Julia (Lux.jl)

```julia
using JuliaGPT

# Load model
config = load_config("config.toml")
model = create_model(config.model)
ps, st, _, _, _ = load_checkpoint("final.jld2")

# Load tokenizer
tokenizer = BPETokenizer("vocab.json", "merges.txt")

# Generate text
prompt = "The nature of reality"
output = generate(model, ps, st, tokenizer, prompt;
                  max_new_tokens=200, temperature=0.8, top_k=40)
println(output)
```

## References

- **Symbiogenesis framework**: [DavinciDreams/symbiogenesis](https://github.com/DavinciDreams/symbiogenesis) — Evolutionary NAS via organism fusion
- **Monarch Mixer**: Dao et al., 2023 — Sub-quadratic GEMM-based sequence mixing
- **Hyena**: Poli et al., 2023 — Long convolutions for sequence modeling
- **Endosymbiotic theory**: Margulis, 1967 — Origin of eukaryotic organelles

## Citation

```bibtex
@misc{symbio-slm-2026,
  title={SymbioSLM: Multi-Organelle Sequence Mixing for Attention-Free Language Modeling},
  author={LisaMegaWatts},
  year={2026},
  url={https://huggingface.co/LisaMegaWatts/SymbioSLM}
}
```

## License

MIT