File size: 5,753 Bytes
fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 3e52795 fed3ca7 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 | ---
language:
- en
license: mit
library_name: lux
tags:
- julia
- lux
- slm
- philosophy
- symbiogenesis
- monarch-mixer
- long-convolution
- causal-conv
- rmsnorm
- swiglu
- bpe
- text-generation
- attention-free
pipeline_tag: text-generation
datasets:
- LisaMegaWatts/philosophy-corpus
model-index:
- name: SymbioSLM
results:
- task:
type: text-generation
name: Text Generation
dataset:
type: LisaMegaWatts/philosophy-corpus
name: philosophy-corpus
metrics:
- type: perplexity
value: 37.3
name: Val PPL
verified: false
- type: loss
value: 3.62
name: Val Loss
verified: false
---
# SymbioSLM
A **5.05M parameter** attention-free language model using the **Symbiogenesis** architecture — multi-organelle sequence mixing with learned per-channel gating. Trained on a philosophy corpus of 981 classical texts (~795M tokens).
## Architecture
Symbiogenesis replaces self-attention with three complementary "organelles" for sequence mixing, inspired by the biological theory of symbiogenesis (Margulis, 1967) — where complex organelles like mitochondria were once independent organisms that fused into eukaryotic cells.
Each of the 8 SymbioBlocks contains:
| Organelle | Function | Scale | Complexity |
|-----------|----------|-------|------------|
| **CausalDepthwiseConv1d** | Local n-gram pattern detection | Local (kernel=4) | O(n) |
| **Monarch Matrix** | Sub-quadratic global sequence mixing | Global | O(n√n) |
| **LongConv** | Dense causal convolution filtering | Global | O(n log n) |
An **OrganelleGate** (per-channel softmax) learns which organelle each embedding channel relies on, creating specialized "fused organisms" per block.
### No Positional Encoding
SymbioSLM requires **no explicit positional encoding** (no RoPE, no sinusoidal embeddings). The Monarch matrices and LongConv kernels implicitly learn position-dependent mixing patterns, while CausalConv captures local ordering through its convolutional structure.
### Model Specifications
| Parameter | Value |
|-----------|-------|
| Architecture | Symbiogenesis |
| Parameters | 5,052,672 (5.05M) |
| Embedding dim | 256 |
| Layers | 8 |
| Monarch heads | 1 per block |
| Conv kernel | 4 |
| FFN | SwiGLU (4x, 2/3 adjusted) |
| Normalization | RMSNorm (pre-norm) |
| Context length | 256 tokens |
| Vocab size | 2,000 (BPE) |
| Weight tying | Yes |
| Free energy reg | 0.001 |
### Parameter Breakdown
| Component | Params | % |
|-----------|--------|---|
| Token embedding | 512,000 | 10.1% |
| SymbioBlocks (8x) | 4,540,672 | 89.9% |
| CausalConv | ~8K/block | |
| Monarch | ~131K/block | |
| LongConv | ~65K/block | |
| OrganelleGate | ~769/block | |
| SwiGLU FFN | ~350K/block | |
| RMSNorm (2x) | ~512/block | |
| Final RMSNorm | 256 | <0.1% |
## Results
Trained for 12,305 steps on an NVIDIA RTX 3060 (12GB).
| Metric | Value |
|--------|-------|
| **Val Loss** | **3.62** |
| **Val PPL** | **37.3** |
| Training steps | 12,305 |
| Batch size | 32 |
| Precision | Float16 (AMP) |
### Comparison with Other 5M Julia SLMs
All models trained on the same philosophy corpus with identical tokenizer and training budget (12,305 steps):
| Model | Architecture | Params | Val Loss | Val PPL |
|-------|-------------|--------|----------|---------|
| [JuliaSLM](https://huggingface.co/LisaMegaWatts/JuliaSLM) | Transformer (RoPE) | 5.04M | **3.54** | **34.5** |
| **SymbioSLM** | **Symbiogenesis** | **5.05M** | **3.62** | **37.3** |
| [MonarchSLM](https://huggingface.co/LisaMegaWatts/MonarchSLM) | Monarch Mixer | 5.04M | 3.65 | 38.4 |
SymbioSLM outperforms the Monarch-only baseline while using no attention mechanism. The multi-organelle fusion provides complementary mixing at different scales that a single mixer cannot achieve alone.
## Training Configuration
```toml
[model]
arch = "symbiogenesis"
embed_dim = 256
n_layers = 8
n_monarch_heads = 1
conv_kernel_size = 4
ffn_mult = 4
context_length = 256
weight_tying = true
free_energy_beta = 0.001
[training]
optimizer = "adamw"
lr = 6e-4
min_lr = 6e-5
warmup_steps = 500
max_steps = 12305
batch_size = 32
grad_clip = 1.0
precision = "f16"
```
## Gelation Monitoring
Training includes gelation monitoring via CUSUM change-point detection on gate entropy. This tracks when the organelle gates transition from uniform mixing to specialized configurations — a phase transition analogous to gel formation in polymer physics.
## Usage
### Julia (Lux.jl)
```julia
using JuliaGPT
# Load model
config = load_config("config.toml")
model = create_model(config.model)
ps, st, _, _, _ = load_checkpoint("final.jld2")
# Load tokenizer
tokenizer = BPETokenizer("vocab.json", "merges.txt")
# Generate text
prompt = "The nature of reality"
output = generate(model, ps, st, tokenizer, prompt;
max_new_tokens=200, temperature=0.8, top_k=40)
println(output)
```
## References
- **Symbiogenesis framework**: [DavinciDreams/symbiogenesis](https://github.com/DavinciDreams/symbiogenesis) — Evolutionary NAS via organism fusion
- **Monarch Mixer**: Dao et al., 2023 — Sub-quadratic GEMM-based sequence mixing
- **Hyena**: Poli et al., 2023 — Long convolutions for sequence modeling
- **Endosymbiotic theory**: Margulis, 1967 — Origin of eukaryotic organelles
## Citation
```bibtex
@misc{symbio-slm-2026,
title={SymbioSLM: Multi-Organelle Sequence Mixing for Attention-Free Language Modeling},
author={LisaMegaWatts},
year={2026},
url={https://huggingface.co/LisaMegaWatts/SymbioSLM}
}
```
## License
MIT
|