SymbioSLM

A 5.05M parameter attention-free language model using the Symbiogenesis architecture — multi-organelle sequence mixing with learned per-channel gating. Trained on a philosophy corpus of 981 classical texts (~795M tokens).

Architecture

Symbiogenesis replaces self-attention with three complementary "organelles" for sequence mixing, inspired by the biological theory of symbiogenesis (Margulis, 1967) — where complex organelles like mitochondria were once independent organisms that fused into eukaryotic cells.

Each of the 8 SymbioBlocks contains:

Organelle	Function	Scale	Complexity
CausalDepthwiseConv1d	Local n-gram pattern detection	Local (kernel=4)	O(n)
Monarch Matrix	Sub-quadratic global sequence mixing	Global	O(n√n)
LongConv	Dense causal convolution filtering	Global	O(n log n)

An OrganelleGate (per-channel softmax) learns which organelle each embedding channel relies on, creating specialized "fused organisms" per block.

No Positional Encoding

SymbioSLM requires no explicit positional encoding (no RoPE, no sinusoidal embeddings). The Monarch matrices and LongConv kernels implicitly learn position-dependent mixing patterns, while CausalConv captures local ordering through its convolutional structure.

Model Specifications

Parameter	Value
Architecture	Symbiogenesis
Parameters	5,052,672 (5.05M)
Embedding dim	256
Layers	8
Monarch heads	1 per block
Conv kernel	4
FFN	SwiGLU (4x, 2/3 adjusted)
Normalization	RMSNorm (pre-norm)
Context length	256 tokens
Vocab size	2,000 (BPE)
Weight tying	Yes
Free energy reg	0.001

Parameter Breakdown

Component	Params	%
Token embedding	512,000	10.1%
SymbioBlocks (8x)	4,540,672	89.9%
CausalConv	~8K/block
Monarch	~131K/block
LongConv	~65K/block
OrganelleGate	~769/block
SwiGLU FFN	~350K/block
RMSNorm (2x)	~512/block
Final RMSNorm	256	<0.1%

Results

Trained for 12,305 steps on an NVIDIA RTX 3060 (12GB).

Metric	Value
Val Loss	3.62
Val PPL	37.3
Training steps	12,305
Batch size	32
Precision	Float16 (AMP)

Comparison with Other 5M Julia SLMs

All models trained on the same philosophy corpus with identical tokenizer and training budget (12,305 steps):

Model	Architecture	Params	Val Loss	Val PPL
JuliaSLM	Transformer (RoPE)	5.04M	3.54	34.5
SymbioSLM	Symbiogenesis	5.05M	3.62	37.3
MonarchSLM	Monarch Mixer	5.04M	3.65	38.4

SymbioSLM outperforms the Monarch-only baseline while using no attention mechanism. The multi-organelle fusion provides complementary mixing at different scales that a single mixer cannot achieve alone.

Training Configuration

[model]
arch = "symbiogenesis"
embed_dim = 256
n_layers = 8
n_monarch_heads = 1
conv_kernel_size = 4
ffn_mult = 4
context_length = 256
weight_tying = true
free_energy_beta = 0.001

[training]
optimizer = "adamw"
lr = 6e-4
min_lr = 6e-5
warmup_steps = 500
max_steps = 12305
batch_size = 32
grad_clip = 1.0
precision = "f16"

Gelation Monitoring

Training includes gelation monitoring via CUSUM change-point detection on gate entropy. This tracks when the organelle gates transition from uniform mixing to specialized configurations — a phase transition analogous to gel formation in polymer physics.

Usage

Julia (Lux.jl)

using JuliaGPT

# Load model
config = load_config("config.toml")
model = create_model(config.model)
ps, st, _, _, _ = load_checkpoint("final.jld2")

# Load tokenizer
tokenizer = BPETokenizer("vocab.json", "merges.txt")

# Generate text
prompt = "The nature of reality"
output = generate(model, ps, st, tokenizer, prompt;
                  max_new_tokens=200, temperature=0.8, top_k=40)
println(output)

References

Symbiogenesis framework: DavinciDreams/symbiogenesis — Evolutionary NAS via organism fusion
Monarch Mixer: Dao et al., 2023 — Sub-quadratic GEMM-based sequence mixing
Hyena: Poli et al., 2023 — Long convolutions for sequence modeling
Endosymbiotic theory: Margulis, 1967 — Origin of eukaryotic organelles

Citation

@misc{symbio-slm-2026,
  title={SymbioSLM: Multi-Organelle Sequence Mixing for Attention-Free Language Modeling},
  author={LisaMegaWatts},
  year={2026},
  url={https://huggingface.co/LisaMegaWatts/SymbioSLM}
}

License

MIT

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for LisaMegaWatts/SymbioSLM

Adapters

1 model

Dataset used to train LisaMegaWatts/SymbioSLM

Evaluation results

Val PPL on philosophy-corpus
self-reported

37.300
Val Loss on philosophy-corpus
self-reported

3.620