SymbioSLM

A 5.05M parameter attention-free language model using the Symbiogenesis architecture โ€” multi-organelle sequence mixing with learned per-channel gating. Trained on a philosophy corpus of 981 classical texts (~795M tokens).

Architecture

Symbiogenesis replaces self-attention with three complementary "organelles" for sequence mixing, inspired by the biological theory of symbiogenesis (Margulis, 1967) โ€” where complex organelles like mitochondria were once independent organisms that fused into eukaryotic cells.

Each of the 8 SymbioBlocks contains:

Organelle Function Scale Complexity
CausalDepthwiseConv1d Local n-gram pattern detection Local (kernel=4) O(n)
Monarch Matrix Sub-quadratic global sequence mixing Global O(nโˆšn)
LongConv Dense causal convolution filtering Global O(n log n)

An OrganelleGate (per-channel softmax) learns which organelle each embedding channel relies on, creating specialized "fused organisms" per block.

No Positional Encoding

SymbioSLM requires no explicit positional encoding (no RoPE, no sinusoidal embeddings). The Monarch matrices and LongConv kernels implicitly learn position-dependent mixing patterns, while CausalConv captures local ordering through its convolutional structure.

Model Specifications

Parameter Value
Architecture Symbiogenesis
Parameters 5,052,672 (5.05M)
Embedding dim 256
Layers 8
Monarch heads 1 per block
Conv kernel 4
FFN SwiGLU (4x, 2/3 adjusted)
Normalization RMSNorm (pre-norm)
Context length 256 tokens
Vocab size 2,000 (BPE)
Weight tying Yes
Free energy reg 0.001

Parameter Breakdown

Component Params %
Token embedding 512,000 10.1%
SymbioBlocks (8x) 4,540,672 89.9%
   CausalConv ~8K/block
   Monarch ~131K/block
   LongConv ~65K/block
   OrganelleGate ~769/block
   SwiGLU FFN ~350K/block
   RMSNorm (2x) ~512/block
Final RMSNorm 256 <0.1%

Results

Trained for 12,305 steps on an NVIDIA RTX 3060 (12GB).

Metric Value
Val Loss 3.62
Val PPL 37.3
Training steps 12,305
Batch size 32
Precision Float16 (AMP)

Comparison with Other 5M Julia SLMs

All models trained on the same philosophy corpus with identical tokenizer and training budget (12,305 steps):

Model Architecture Params Val Loss Val PPL
JuliaSLM Transformer (RoPE) 5.04M 3.54 34.5
SymbioSLM Symbiogenesis 5.05M 3.62 37.3
MonarchSLM Monarch Mixer 5.04M 3.65 38.4

SymbioSLM outperforms the Monarch-only baseline while using no attention mechanism. The multi-organelle fusion provides complementary mixing at different scales that a single mixer cannot achieve alone.

Training Configuration

[model]
arch = "symbiogenesis"
embed_dim = 256
n_layers = 8
n_monarch_heads = 1
conv_kernel_size = 4
ffn_mult = 4
context_length = 256
weight_tying = true
free_energy_beta = 0.001

[training]
optimizer = "adamw"
lr = 6e-4
min_lr = 6e-5
warmup_steps = 500
max_steps = 12305
batch_size = 32
grad_clip = 1.0
precision = "f16"

Gelation Monitoring

Training includes gelation monitoring via CUSUM change-point detection on gate entropy. This tracks when the organelle gates transition from uniform mixing to specialized configurations โ€” a phase transition analogous to gel formation in polymer physics.

Usage

Julia (Lux.jl)

using JuliaGPT

# Load model
config = load_config("config.toml")
model = create_model(config.model)
ps, st, _, _, _ = load_checkpoint("final.jld2")

# Load tokenizer
tokenizer = BPETokenizer("vocab.json", "merges.txt")

# Generate text
prompt = "The nature of reality"
output = generate(model, ps, st, tokenizer, prompt;
                  max_new_tokens=200, temperature=0.8, top_k=40)
println(output)

References

  • Symbiogenesis framework: DavinciDreams/symbiogenesis โ€” Evolutionary NAS via organism fusion
  • Monarch Mixer: Dao et al., 2023 โ€” Sub-quadratic GEMM-based sequence mixing
  • Hyena: Poli et al., 2023 โ€” Long convolutions for sequence modeling
  • Endosymbiotic theory: Margulis, 1967 โ€” Origin of eukaryotic organelles

Citation

@misc{symbio-slm-2026,
  title={SymbioSLM: Multi-Organelle Sequence Mixing for Attention-Free Language Modeling},
  author={LisaMegaWatts},
  year={2026},
  url={https://huggingface.co/LisaMegaWatts/SymbioSLM}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for LisaMegaWatts/SymbioSLM

Adapters
1 model

Dataset used to train LisaMegaWatts/SymbioSLM

Space using LisaMegaWatts/SymbioSLM 1

Evaluation results