| | --- |
| | language: |
| | - en |
| | license: mit |
| | library_name: lux |
| | tags: |
| | - julia |
| | - lux |
| | - slm |
| | - philosophy |
| | - symbiogenesis |
| | - monarch-mixer |
| | - long-convolution |
| | - causal-conv |
| | - rmsnorm |
| | - swiglu |
| | - bpe |
| | - text-generation |
| | - attention-free |
| | pipeline_tag: text-generation |
| | datasets: |
| | - LisaMegaWatts/philosophy-corpus |
| | model-index: |
| | - name: SymbioSLM |
| | results: |
| | - task: |
| | type: text-generation |
| | name: Text Generation |
| | dataset: |
| | type: LisaMegaWatts/philosophy-corpus |
| | name: philosophy-corpus |
| | metrics: |
| | - type: perplexity |
| | value: 37.3 |
| | name: Val PPL |
| | verified: false |
| | - type: loss |
| | value: 3.62 |
| | name: Val Loss |
| | verified: false |
| | --- |
| | |
| | # SymbioSLM |
| |
|
| | A **5.05M parameter** attention-free language model using the **Symbiogenesis** architecture — multi-organelle sequence mixing with learned per-channel gating. Trained on a philosophy corpus of 981 classical texts (~795M tokens). |
| |
|
| | ## Architecture |
| |
|
| | Symbiogenesis replaces self-attention with three complementary "organelles" for sequence mixing, inspired by the biological theory of symbiogenesis (Margulis, 1967) — where complex organelles like mitochondria were once independent organisms that fused into eukaryotic cells. |
| |
|
| | Each of the 8 SymbioBlocks contains: |
| |
|
| | | Organelle | Function | Scale | Complexity | |
| | |-----------|----------|-------|------------| |
| | | **CausalDepthwiseConv1d** | Local n-gram pattern detection | Local (kernel=4) | O(n) | |
| | | **Monarch Matrix** | Sub-quadratic global sequence mixing | Global | O(n√n) | |
| | | **LongConv** | Dense causal convolution filtering | Global | O(n log n) | |
| |
|
| | An **OrganelleGate** (per-channel softmax) learns which organelle each embedding channel relies on, creating specialized "fused organisms" per block. |
| |
|
| | ### No Positional Encoding |
| |
|
| | SymbioSLM requires **no explicit positional encoding** (no RoPE, no sinusoidal embeddings). The Monarch matrices and LongConv kernels implicitly learn position-dependent mixing patterns, while CausalConv captures local ordering through its convolutional structure. |
| |
|
| | ### Model Specifications |
| |
|
| | | Parameter | Value | |
| | |-----------|-------| |
| | | Architecture | Symbiogenesis | |
| | | Parameters | 5,052,672 (5.05M) | |
| | | Embedding dim | 256 | |
| | | Layers | 8 | |
| | | Monarch heads | 1 per block | |
| | | Conv kernel | 4 | |
| | | FFN | SwiGLU (4x, 2/3 adjusted) | |
| | | Normalization | RMSNorm (pre-norm) | |
| | | Context length | 256 tokens | |
| | | Vocab size | 2,000 (BPE) | |
| | | Weight tying | Yes | |
| | | Free energy reg | 0.001 | |
| |
|
| | ### Parameter Breakdown |
| |
|
| | | Component | Params | % | |
| | |-----------|--------|---| |
| | | Token embedding | 512,000 | 10.1% | |
| | | SymbioBlocks (8x) | 4,540,672 | 89.9% | |
| | | CausalConv | ~8K/block | | |
| | | Monarch | ~131K/block | | |
| | | LongConv | ~65K/block | | |
| | | OrganelleGate | ~769/block | | |
| | | SwiGLU FFN | ~350K/block | | |
| | | RMSNorm (2x) | ~512/block | | |
| | | Final RMSNorm | 256 | <0.1% | |
| |
|
| | ## Results |
| |
|
| | Trained for 12,305 steps on an NVIDIA RTX 3060 (12GB). |
| |
|
| | | Metric | Value | |
| | |--------|-------| |
| | | **Val Loss** | **3.62** | |
| | | **Val PPL** | **37.3** | |
| | | Training steps | 12,305 | |
| | | Batch size | 32 | |
| | | Precision | Float16 (AMP) | |
| |
|
| | ### Comparison with Other 5M Julia SLMs |
| |
|
| | All models trained on the same philosophy corpus with identical tokenizer and training budget (12,305 steps): |
| |
|
| | | Model | Architecture | Params | Val Loss | Val PPL | |
| | |-------|-------------|--------|----------|---------| |
| | | [JuliaSLM](https://huggingface.co/LisaMegaWatts/JuliaSLM) | Transformer (RoPE) | 5.04M | **3.54** | **34.5** | |
| | | **SymbioSLM** | **Symbiogenesis** | **5.05M** | **3.62** | **37.3** | |
| | | [MonarchSLM](https://huggingface.co/LisaMegaWatts/MonarchSLM) | Monarch Mixer | 5.04M | 3.65 | 38.4 | |
| |
|
| | SymbioSLM outperforms the Monarch-only baseline while using no attention mechanism. The multi-organelle fusion provides complementary mixing at different scales that a single mixer cannot achieve alone. |
| |
|
| | ## Training Configuration |
| |
|
| | ```toml |
| | [model] |
| | arch = "symbiogenesis" |
| | embed_dim = 256 |
| | n_layers = 8 |
| | n_monarch_heads = 1 |
| | conv_kernel_size = 4 |
| | ffn_mult = 4 |
| | context_length = 256 |
| | weight_tying = true |
| | free_energy_beta = 0.001 |
| | |
| | [training] |
| | optimizer = "adamw" |
| | lr = 6e-4 |
| | min_lr = 6e-5 |
| | warmup_steps = 500 |
| | max_steps = 12305 |
| | batch_size = 32 |
| | grad_clip = 1.0 |
| | precision = "f16" |
| | ``` |
| |
|
| | ## Gelation Monitoring |
| |
|
| | Training includes gelation monitoring via CUSUM change-point detection on gate entropy. This tracks when the organelle gates transition from uniform mixing to specialized configurations — a phase transition analogous to gel formation in polymer physics. |
| |
|
| | ## Usage |
| |
|
| | ### Julia (Lux.jl) |
| |
|
| | ```julia |
| | using JuliaGPT |
| | |
| | # Load model |
| | config = load_config("config.toml") |
| | model = create_model(config.model) |
| | ps, st, _, _, _ = load_checkpoint("final.jld2") |
| | |
| | # Load tokenizer |
| | tokenizer = BPETokenizer("vocab.json", "merges.txt") |
| | |
| | # Generate text |
| | prompt = "The nature of reality" |
| | output = generate(model, ps, st, tokenizer, prompt; |
| | max_new_tokens=200, temperature=0.8, top_k=40) |
| | println(output) |
| | ``` |
| |
|
| | ## References |
| |
|
| | - **Symbiogenesis framework**: [DavinciDreams/symbiogenesis](https://github.com/DavinciDreams/symbiogenesis) — Evolutionary NAS via organism fusion |
| | - **Monarch Mixer**: Dao et al., 2023 — Sub-quadratic GEMM-based sequence mixing |
| | - **Hyena**: Poli et al., 2023 — Long convolutions for sequence modeling |
| | - **Endosymbiotic theory**: Margulis, 1967 — Origin of eukaryotic organelles |
| |
|
| | ## Citation |
| |
|
| | ```bibtex |
| | @misc{symbio-slm-2026, |
| | title={SymbioSLM: Multi-Organelle Sequence Mixing for Attention-Free Language Modeling}, |
| | author={LisaMegaWatts}, |
| | year={2026}, |
| | url={https://huggingface.co/LisaMegaWatts/SymbioSLM} |
| | } |
| | ``` |
| |
|
| | ## License |
| |
|
| | MIT |
| |
|