--- language: - en license: mit library_name: lux tags: - julia - lux - slm - philosophy - symbiogenesis - monarch-mixer - long-convolution - causal-conv - rmsnorm - swiglu - bpe - text-generation - attention-free pipeline_tag: text-generation datasets: - LisaMegaWatts/philosophy-corpus model-index: - name: SymbioSLM results: - task: type: text-generation name: Text Generation dataset: type: LisaMegaWatts/philosophy-corpus name: philosophy-corpus metrics: - type: perplexity value: 37.3 name: Val PPL verified: false - type: loss value: 3.62 name: Val Loss verified: false --- # SymbioSLM A **5.05M parameter** attention-free language model using the **Symbiogenesis** architecture — multi-organelle sequence mixing with learned per-channel gating. Trained on a philosophy corpus of 981 classical texts (~795M tokens). ## Architecture Symbiogenesis replaces self-attention with three complementary "organelles" for sequence mixing, inspired by the biological theory of symbiogenesis (Margulis, 1967) — where complex organelles like mitochondria were once independent organisms that fused into eukaryotic cells. Each of the 8 SymbioBlocks contains: | Organelle | Function | Scale | Complexity | |-----------|----------|-------|------------| | **CausalDepthwiseConv1d** | Local n-gram pattern detection | Local (kernel=4) | O(n) | | **Monarch Matrix** | Sub-quadratic global sequence mixing | Global | O(n√n) | | **LongConv** | Dense causal convolution filtering | Global | O(n log n) | An **OrganelleGate** (per-channel softmax) learns which organelle each embedding channel relies on, creating specialized "fused organisms" per block. ### No Positional Encoding SymbioSLM requires **no explicit positional encoding** (no RoPE, no sinusoidal embeddings). The Monarch matrices and LongConv kernels implicitly learn position-dependent mixing patterns, while CausalConv captures local ordering through its convolutional structure. ### Model Specifications | Parameter | Value | |-----------|-------| | Architecture | Symbiogenesis | | Parameters | 5,052,672 (5.05M) | | Embedding dim | 256 | | Layers | 8 | | Monarch heads | 1 per block | | Conv kernel | 4 | | FFN | SwiGLU (4x, 2/3 adjusted) | | Normalization | RMSNorm (pre-norm) | | Context length | 256 tokens | | Vocab size | 2,000 (BPE) | | Weight tying | Yes | | Free energy reg | 0.001 | ### Parameter Breakdown | Component | Params | % | |-----------|--------|---| | Token embedding | 512,000 | 10.1% | | SymbioBlocks (8x) | 4,540,672 | 89.9% | |    CausalConv | ~8K/block | | |    Monarch | ~131K/block | | |    LongConv | ~65K/block | | |    OrganelleGate | ~769/block | | |    SwiGLU FFN | ~350K/block | | |    RMSNorm (2x) | ~512/block | | | Final RMSNorm | 256 | <0.1% | ## Results Trained for 12,305 steps on an NVIDIA RTX 3060 (12GB). | Metric | Value | |--------|-------| | **Val Loss** | **3.62** | | **Val PPL** | **37.3** | | Training steps | 12,305 | | Batch size | 32 | | Precision | Float16 (AMP) | ### Comparison with Other 5M Julia SLMs All models trained on the same philosophy corpus with identical tokenizer and training budget (12,305 steps): | Model | Architecture | Params | Val Loss | Val PPL | |-------|-------------|--------|----------|---------| | [JuliaSLM](https://huggingface.co/LisaMegaWatts/JuliaSLM) | Transformer (RoPE) | 5.04M | **3.54** | **34.5** | | **SymbioSLM** | **Symbiogenesis** | **5.05M** | **3.62** | **37.3** | | [MonarchSLM](https://huggingface.co/LisaMegaWatts/MonarchSLM) | Monarch Mixer | 5.04M | 3.65 | 38.4 | SymbioSLM outperforms the Monarch-only baseline while using no attention mechanism. The multi-organelle fusion provides complementary mixing at different scales that a single mixer cannot achieve alone. ## Training Configuration ```toml [model] arch = "symbiogenesis" embed_dim = 256 n_layers = 8 n_monarch_heads = 1 conv_kernel_size = 4 ffn_mult = 4 context_length = 256 weight_tying = true free_energy_beta = 0.001 [training] optimizer = "adamw" lr = 6e-4 min_lr = 6e-5 warmup_steps = 500 max_steps = 12305 batch_size = 32 grad_clip = 1.0 precision = "f16" ``` ## Gelation Monitoring Training includes gelation monitoring via CUSUM change-point detection on gate entropy. This tracks when the organelle gates transition from uniform mixing to specialized configurations — a phase transition analogous to gel formation in polymer physics. ## Usage ### Julia (Lux.jl) ```julia using JuliaGPT # Load model config = load_config("config.toml") model = create_model(config.model) ps, st, _, _, _ = load_checkpoint("final.jld2") # Load tokenizer tokenizer = BPETokenizer("vocab.json", "merges.txt") # Generate text prompt = "The nature of reality" output = generate(model, ps, st, tokenizer, prompt; max_new_tokens=200, temperature=0.8, top_k=40) println(output) ``` ## References - **Symbiogenesis framework**: [DavinciDreams/symbiogenesis](https://github.com/DavinciDreams/symbiogenesis) — Evolutionary NAS via organism fusion - **Monarch Mixer**: Dao et al., 2023 — Sub-quadratic GEMM-based sequence mixing - **Hyena**: Poli et al., 2023 — Long convolutions for sequence modeling - **Endosymbiotic theory**: Margulis, 1967 — Origin of eukaryotic organelles ## Citation ```bibtex @misc{symbio-slm-2026, title={SymbioSLM: Multi-Organelle Sequence Mixing for Attention-Free Language Modeling}, author={LisaMegaWatts}, year={2026}, url={https://huggingface.co/LisaMegaWatts/SymbioSLM} } ``` ## License MIT