SymbioSLM
A 5.05M parameter attention-free language model using the Symbiogenesis architecture โ multi-organelle sequence mixing with learned per-channel gating. Trained on a philosophy corpus of 981 classical texts (~795M tokens).
Architecture
Symbiogenesis replaces self-attention with three complementary "organelles" for sequence mixing, inspired by the biological theory of symbiogenesis (Margulis, 1967) โ where complex organelles like mitochondria were once independent organisms that fused into eukaryotic cells.
Each of the 8 SymbioBlocks contains:
| Organelle | Function | Scale | Complexity |
|---|---|---|---|
| CausalDepthwiseConv1d | Local n-gram pattern detection | Local (kernel=4) | O(n) |
| Monarch Matrix | Sub-quadratic global sequence mixing | Global | O(nโn) |
| LongConv | Dense causal convolution filtering | Global | O(n log n) |
An OrganelleGate (per-channel softmax) learns which organelle each embedding channel relies on, creating specialized "fused organisms" per block.
No Positional Encoding
SymbioSLM requires no explicit positional encoding (no RoPE, no sinusoidal embeddings). The Monarch matrices and LongConv kernels implicitly learn position-dependent mixing patterns, while CausalConv captures local ordering through its convolutional structure.
Model Specifications
| Parameter | Value |
|---|---|
| Architecture | Symbiogenesis |
| Parameters | 5,052,672 (5.05M) |
| Embedding dim | 256 |
| Layers | 8 |
| Monarch heads | 1 per block |
| Conv kernel | 4 |
| FFN | SwiGLU (4x, 2/3 adjusted) |
| Normalization | RMSNorm (pre-norm) |
| Context length | 256 tokens |
| Vocab size | 2,000 (BPE) |
| Weight tying | Yes |
| Free energy reg | 0.001 |
Parameter Breakdown
| Component | Params | % |
|---|---|---|
| Token embedding | 512,000 | 10.1% |
| SymbioBlocks (8x) | 4,540,672 | 89.9% |
| CausalConv | ~8K/block | |
| Monarch | ~131K/block | |
| LongConv | ~65K/block | |
| OrganelleGate | ~769/block | |
| SwiGLU FFN | ~350K/block | |
| RMSNorm (2x) | ~512/block | |
| Final RMSNorm | 256 | <0.1% |
Results
Trained for 12,305 steps on an NVIDIA RTX 3060 (12GB).
| Metric | Value |
|---|---|
| Val Loss | 3.62 |
| Val PPL | 37.3 |
| Training steps | 12,305 |
| Batch size | 32 |
| Precision | Float16 (AMP) |
Comparison with Other 5M Julia SLMs
All models trained on the same philosophy corpus with identical tokenizer and training budget (12,305 steps):
| Model | Architecture | Params | Val Loss | Val PPL |
|---|---|---|---|---|
| JuliaSLM | Transformer (RoPE) | 5.04M | 3.54 | 34.5 |
| SymbioSLM | Symbiogenesis | 5.05M | 3.62 | 37.3 |
| MonarchSLM | Monarch Mixer | 5.04M | 3.65 | 38.4 |
SymbioSLM outperforms the Monarch-only baseline while using no attention mechanism. The multi-organelle fusion provides complementary mixing at different scales that a single mixer cannot achieve alone.
Training Configuration
[model]
arch = "symbiogenesis"
embed_dim = 256
n_layers = 8
n_monarch_heads = 1
conv_kernel_size = 4
ffn_mult = 4
context_length = 256
weight_tying = true
free_energy_beta = 0.001
[training]
optimizer = "adamw"
lr = 6e-4
min_lr = 6e-5
warmup_steps = 500
max_steps = 12305
batch_size = 32
grad_clip = 1.0
precision = "f16"
Gelation Monitoring
Training includes gelation monitoring via CUSUM change-point detection on gate entropy. This tracks when the organelle gates transition from uniform mixing to specialized configurations โ a phase transition analogous to gel formation in polymer physics.
Usage
Julia (Lux.jl)
using JuliaGPT
# Load model
config = load_config("config.toml")
model = create_model(config.model)
ps, st, _, _, _ = load_checkpoint("final.jld2")
# Load tokenizer
tokenizer = BPETokenizer("vocab.json", "merges.txt")
# Generate text
prompt = "The nature of reality"
output = generate(model, ps, st, tokenizer, prompt;
max_new_tokens=200, temperature=0.8, top_k=40)
println(output)
References
- Symbiogenesis framework: DavinciDreams/symbiogenesis โ Evolutionary NAS via organism fusion
- Monarch Mixer: Dao et al., 2023 โ Sub-quadratic GEMM-based sequence mixing
- Hyena: Poli et al., 2023 โ Long convolutions for sequence modeling
- Endosymbiotic theory: Margulis, 1967 โ Origin of eukaryotic organelles
Citation
@misc{symbio-slm-2026,
title={SymbioSLM: Multi-Organelle Sequence Mixing for Attention-Free Language Modeling},
author={LisaMegaWatts},
year={2026},
url={https://huggingface.co/LisaMegaWatts/SymbioSLM}
}
License
MIT
Model tree for LisaMegaWatts/SymbioSLM
Dataset used to train LisaMegaWatts/SymbioSLM
Space using LisaMegaWatts/SymbioSLM 1
Evaluation results
- Val PPL on philosophy-corpusself-reported37.300
- Val Loss on philosophy-corpusself-reported3.620