---
language:
  - en
license: mit
library_name: lux
tags:
  - julia
  - lux
  - slm
  - philosophy
  - symbiogenesis
  - monarch-mixer
  - long-convolution
  - causal-conv
  - rmsnorm
  - swiglu
  - bpe
  - text-generation
  - attention-free
pipeline_tag: text-generation
datasets:
  - LisaMegaWatts/philosophy-corpus
model-index:
  - name: SymbioSLM
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: LisaMegaWatts/philosophy-corpus
          name: philosophy-corpus
        metrics:
          - type: perplexity
            value: 37.3
            name: Val PPL
            verified: false
          - type: loss
            value: 3.62
            name: Val Loss
            verified: false
---

# SymbioSLM

A **5.05M parameter** attention-free language model using the **Symbiogenesis** architecture — multi-organelle sequence mixing with learned per-channel gating. Trained on a philosophy corpus of 981 classical texts (~795M tokens).

## Architecture

Symbiogenesis replaces self-attention with three complementary "organelles" for sequence mixing, inspired by the biological theory of symbiogenesis (Margulis, 1967) — where complex organelles like mitochondria were once independent organisms that fused into eukaryotic cells.

Each of the 8 SymbioBlocks contains:

| Organelle | Function | Scale | Complexity |
|-----------|----------|-------|------------|
| **CausalDepthwiseConv1d** | Local n-gram pattern detection | Local (kernel=4) | O(n) |
| **Monarch Matrix** | Sub-quadratic global sequence mixing | Global | O(n&radic;n) |
| **LongConv** | Dense causal convolution filtering | Global | O(n log n) |

An **OrganelleGate** (per-channel softmax) learns which organelle each embedding channel relies on, creating specialized "fused organisms" per block.

### No Positional Encoding

SymbioSLM requires **no explicit positional encoding** (no RoPE, no sinusoidal embeddings). The Monarch matrices and LongConv kernels implicitly learn position-dependent mixing patterns, while CausalConv captures local ordering through its convolutional structure.

### Model Specifications

| Parameter | Value |
|-----------|-------|
| Architecture | Symbiogenesis |
| Parameters | 5,052,672 (5.05M) |
| Embedding dim | 256 |
| Layers | 8 |
| Monarch heads | 1 per block |
| Conv kernel | 4 |
| FFN | SwiGLU (4x, 2/3 adjusted) |
| Normalization | RMSNorm (pre-norm) |
| Context length | 256 tokens |
| Vocab size | 2,000 (BPE) |
| Weight tying | Yes |
| Free energy reg | 0.001 |

### Parameter Breakdown

| Component | Params | % |
|-----------|--------|---|
| Token embedding | 512,000 | 10.1% |
| SymbioBlocks (8x) | 4,540,672 | 89.9% |
| &nbsp;&nbsp; CausalConv | ~8K/block | |
| &nbsp;&nbsp; Monarch | ~131K/block | |
| &nbsp;&nbsp; LongConv | ~65K/block | |
| &nbsp;&nbsp; OrganelleGate | ~769/block | |
| &nbsp;&nbsp; SwiGLU FFN | ~350K/block | |
| &nbsp;&nbsp; RMSNorm (2x) | ~512/block | |
| Final RMSNorm | 256 | <0.1% |

## Results

Trained for 12,305 steps on an NVIDIA RTX 3060 (12GB).

| Metric | Value |
|--------|-------|
| **Val Loss** | **3.62** |
| **Val PPL** | **37.3** |
| Training steps | 12,305 |
| Batch size | 32 |
| Precision | Float16 (AMP) |

### Comparison with Other 5M Julia SLMs

All models trained on the same philosophy corpus with identical tokenizer and training budget (12,305 steps):

| Model | Architecture | Params | Val Loss | Val PPL |
|-------|-------------|--------|----------|---------|
| [JuliaSLM](https://huggingface.co/LisaMegaWatts/JuliaSLM) | Transformer (RoPE) | 5.04M | **3.54** | **34.5** |
| **SymbioSLM** | **Symbiogenesis** | **5.05M** | **3.62** | **37.3** |
| [MonarchSLM](https://huggingface.co/LisaMegaWatts/MonarchSLM) | Monarch Mixer | 5.04M | 3.65 | 38.4 |

SymbioSLM outperforms the Monarch-only baseline while using no attention mechanism. The multi-organelle fusion provides complementary mixing at different scales that a single mixer cannot achieve alone.

## Training Configuration

```toml
[model]
arch = "symbiogenesis"
embed_dim = 256
n_layers = 8
n_monarch_heads = 1
conv_kernel_size = 4
ffn_mult = 4
context_length = 256
weight_tying = true
free_energy_beta = 0.001

[training]
optimizer = "adamw"
lr = 6e-4
min_lr = 6e-5
warmup_steps = 500
max_steps = 12305
batch_size = 32
grad_clip = 1.0
precision = "f16"
```

## Gelation Monitoring

Training includes gelation monitoring via CUSUM change-point detection on gate entropy. This tracks when the organelle gates transition from uniform mixing to specialized configurations — a phase transition analogous to gel formation in polymer physics.

## Usage

### Julia (Lux.jl)

```julia
using JuliaGPT

# Load model
config = load_config("config.toml")
model = create_model(config.model)
ps, st, _, _, _ = load_checkpoint("final.jld2")

# Load tokenizer
tokenizer = BPETokenizer("vocab.json", "merges.txt")

# Generate text
prompt = "The nature of reality"
output = generate(model, ps, st, tokenizer, prompt;
                  max_new_tokens=200, temperature=0.8, top_k=40)
println(output)
```

## References

- **Symbiogenesis framework**: [DavinciDreams/symbiogenesis](https://github.com/DavinciDreams/symbiogenesis) — Evolutionary NAS via organism fusion
- **Monarch Mixer**: Dao et al., 2023 — Sub-quadratic GEMM-based sequence mixing
- **Hyena**: Poli et al., 2023 — Long convolutions for sequence modeling
- **Endosymbiotic theory**: Margulis, 1967 — Origin of eukaryotic organelles

## Citation

```bibtex
@misc{symbio-slm-2026,
  title={SymbioSLM: Multi-Organelle Sequence Mixing for Attention-Free Language Modeling},
  author={LisaMegaWatts},
  year={2026},
  url={https://huggingface.co/LisaMegaWatts/SymbioSLM}
}
```

## License

MIT