SymbioSLM / README.md
LisaMegaWatts's picture
Restore v1 model (val loss 3.62, full 12305-step training)
3e52795 verified
---
language:
- en
license: mit
library_name: lux
tags:
- julia
- lux
- slm
- philosophy
- symbiogenesis
- monarch-mixer
- long-convolution
- causal-conv
- rmsnorm
- swiglu
- bpe
- text-generation
- attention-free
pipeline_tag: text-generation
datasets:
- LisaMegaWatts/philosophy-corpus
model-index:
- name: SymbioSLM
results:
- task:
type: text-generation
name: Text Generation
dataset:
type: LisaMegaWatts/philosophy-corpus
name: philosophy-corpus
metrics:
- type: perplexity
value: 37.3
name: Val PPL
verified: false
- type: loss
value: 3.62
name: Val Loss
verified: false
---
# SymbioSLM
A **5.05M parameter** attention-free language model using the **Symbiogenesis** architecture — multi-organelle sequence mixing with learned per-channel gating. Trained on a philosophy corpus of 981 classical texts (~795M tokens).
## Architecture
Symbiogenesis replaces self-attention with three complementary "organelles" for sequence mixing, inspired by the biological theory of symbiogenesis (Margulis, 1967) — where complex organelles like mitochondria were once independent organisms that fused into eukaryotic cells.
Each of the 8 SymbioBlocks contains:
| Organelle | Function | Scale | Complexity |
|-----------|----------|-------|------------|
| **CausalDepthwiseConv1d** | Local n-gram pattern detection | Local (kernel=4) | O(n) |
| **Monarch Matrix** | Sub-quadratic global sequence mixing | Global | O(n√n) |
| **LongConv** | Dense causal convolution filtering | Global | O(n log n) |
An **OrganelleGate** (per-channel softmax) learns which organelle each embedding channel relies on, creating specialized "fused organisms" per block.
### No Positional Encoding
SymbioSLM requires **no explicit positional encoding** (no RoPE, no sinusoidal embeddings). The Monarch matrices and LongConv kernels implicitly learn position-dependent mixing patterns, while CausalConv captures local ordering through its convolutional structure.
### Model Specifications
| Parameter | Value |
|-----------|-------|
| Architecture | Symbiogenesis |
| Parameters | 5,052,672 (5.05M) |
| Embedding dim | 256 |
| Layers | 8 |
| Monarch heads | 1 per block |
| Conv kernel | 4 |
| FFN | SwiGLU (4x, 2/3 adjusted) |
| Normalization | RMSNorm (pre-norm) |
| Context length | 256 tokens |
| Vocab size | 2,000 (BPE) |
| Weight tying | Yes |
| Free energy reg | 0.001 |
### Parameter Breakdown
| Component | Params | % |
|-----------|--------|---|
| Token embedding | 512,000 | 10.1% |
| SymbioBlocks (8x) | 4,540,672 | 89.9% |
|    CausalConv | ~8K/block | |
|    Monarch | ~131K/block | |
|    LongConv | ~65K/block | |
|    OrganelleGate | ~769/block | |
|    SwiGLU FFN | ~350K/block | |
|    RMSNorm (2x) | ~512/block | |
| Final RMSNorm | 256 | <0.1% |
## Results
Trained for 12,305 steps on an NVIDIA RTX 3060 (12GB).
| Metric | Value |
|--------|-------|
| **Val Loss** | **3.62** |
| **Val PPL** | **37.3** |
| Training steps | 12,305 |
| Batch size | 32 |
| Precision | Float16 (AMP) |
### Comparison with Other 5M Julia SLMs
All models trained on the same philosophy corpus with identical tokenizer and training budget (12,305 steps):
| Model | Architecture | Params | Val Loss | Val PPL |
|-------|-------------|--------|----------|---------|
| [JuliaSLM](https://huggingface.co/LisaMegaWatts/JuliaSLM) | Transformer (RoPE) | 5.04M | **3.54** | **34.5** |
| **SymbioSLM** | **Symbiogenesis** | **5.05M** | **3.62** | **37.3** |
| [MonarchSLM](https://huggingface.co/LisaMegaWatts/MonarchSLM) | Monarch Mixer | 5.04M | 3.65 | 38.4 |
SymbioSLM outperforms the Monarch-only baseline while using no attention mechanism. The multi-organelle fusion provides complementary mixing at different scales that a single mixer cannot achieve alone.
## Training Configuration
```toml
[model]
arch = "symbiogenesis"
embed_dim = 256
n_layers = 8
n_monarch_heads = 1
conv_kernel_size = 4
ffn_mult = 4
context_length = 256
weight_tying = true
free_energy_beta = 0.001
[training]
optimizer = "adamw"
lr = 6e-4
min_lr = 6e-5
warmup_steps = 500
max_steps = 12305
batch_size = 32
grad_clip = 1.0
precision = "f16"
```
## Gelation Monitoring
Training includes gelation monitoring via CUSUM change-point detection on gate entropy. This tracks when the organelle gates transition from uniform mixing to specialized configurations — a phase transition analogous to gel formation in polymer physics.
## Usage
### Julia (Lux.jl)
```julia
using JuliaGPT
# Load model
config = load_config("config.toml")
model = create_model(config.model)
ps, st, _, _, _ = load_checkpoint("final.jld2")
# Load tokenizer
tokenizer = BPETokenizer("vocab.json", "merges.txt")
# Generate text
prompt = "The nature of reality"
output = generate(model, ps, st, tokenizer, prompt;
max_new_tokens=200, temperature=0.8, top_k=40)
println(output)
```
## References
- **Symbiogenesis framework**: [DavinciDreams/symbiogenesis](https://github.com/DavinciDreams/symbiogenesis) — Evolutionary NAS via organism fusion
- **Monarch Mixer**: Dao et al., 2023 — Sub-quadratic GEMM-based sequence mixing
- **Hyena**: Poli et al., 2023 — Long convolutions for sequence modeling
- **Endosymbiotic theory**: Margulis, 1967 — Origin of eukaryotic organelles
## Citation
```bibtex
@misc{symbio-slm-2026,
title={SymbioSLM: Multi-Organelle Sequence Mixing for Attention-Free Language Modeling},
author={LisaMegaWatts},
year={2026},
url={https://huggingface.co/LisaMegaWatts/SymbioSLM}
}
```
## License
MIT