README.md · LisaMegaWatts/SymbioSLM at main

File size: 5,753 Bytes

fed3ca7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e52795
fed3ca7
3e52795
 
fed3ca7
 
 
 
 
 
 
 
 
 
 
3e52795
 
 
 
 
 
 
fed3ca7
 
 
 
3e52795
fed3ca7
 
 
3e52795
fed3ca7
3e52795
fed3ca7
3e52795
 
 
 
 
fed3ca7
3e52795
fed3ca7
3e52795
fed3ca7
3e52795
fed3ca7
3e52795
fed3ca7
 
3e52795
 
 
 
 
 
 
 
fed3ca7
3e52795
 
 
 
fed3ca7
 
 
 
3e52795
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fed3ca7
3e52795
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fed3ca7
3e52795
fed3ca7
3e52795
fed3ca7
 
 
3e52795
fed3ca7
 
3e52795
fed3ca7
3e52795
 
 
 
fed3ca7
3e52795
 
fed3ca7
3e52795
 
 
 
 
fed3ca7
 
3e52795
fed3ca7
3e52795
 
 
 
fed3ca7
 
 
 
3e52795
 
fed3ca7

---
language:
  - en
license: mit
library_name: lux
tags:
  - julia
  - lux
  - slm
  - philosophy
  - symbiogenesis
  - monarch-mixer
  - long-convolution
  - causal-conv
  - rmsnorm
  - swiglu
  - bpe
  - text-generation
  - attention-free
pipeline_tag: text-generation
datasets:
  - LisaMegaWatts/philosophy-corpus
model-index:
  - name: SymbioSLM
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          type: LisaMegaWatts/philosophy-corpus
          name: philosophy-corpus
        metrics:
          - type: perplexity
            value: 37.3
            name: Val PPL
            verified: false
          - type: loss
            value: 3.62
            name: Val Loss
            verified: false
---

# SymbioSLM

A **5.05M parameter** attention-free language model using the **Symbiogenesis** architecture — multi-organelle sequence mixing with learned per-channel gating. Trained on a philosophy corpus of 981 classical texts (~795M tokens).

## Architecture

Symbiogenesis replaces self-attention with three complementary "organelles" for sequence mixing, inspired by the biological theory of symbiogenesis (Margulis, 1967) — where complex organelles like mitochondria were once independent organisms that fused into eukaryotic cells.

Each of the 8 SymbioBlocks contains:

| Organelle | Function | Scale | Complexity |
|-----------|----------|-------|------------|
| **CausalDepthwiseConv1d** | Local n-gram pattern detection | Local (kernel=4) | O(n) |
| **Monarch Matrix** | Sub-quadratic global sequence mixing | Global | O(n&radic;n) |
| **LongConv** | Dense causal convolution filtering | Global | O(n log n) |

An **OrganelleGate** (per-channel softmax) learns which organelle each embedding channel relies on, creating specialized "fused organisms" per block.

### No Positional Encoding

SymbioSLM requires **no explicit positional encoding** (no RoPE, no sinusoidal embeddings). The Monarch matrices and LongConv kernels implicitly learn position-dependent mixing patterns, while CausalConv captures local ordering through its convolutional structure.

### Model Specifications

| Parameter | Value |
|-----------|-------|
| Architecture | Symbiogenesis |
| Parameters | 5,052,672 (5.05M) |
| Embedding dim | 256 |
| Layers | 8 |
| Monarch heads | 1 per block |
| Conv kernel | 4 |
| FFN | SwiGLU (4x, 2/3 adjusted) |
| Normalization | RMSNorm (pre-norm) |
| Context length | 256 tokens |
| Vocab size | 2,000 (BPE) |
| Weight tying | Yes |
| Free energy reg | 0.001 |

### Parameter Breakdown

| Component | Params | % |
|-----------|--------|---|
| Token embedding | 512,000 | 10.1% |
| SymbioBlocks (8x) | 4,540,672 | 89.9% |
| &nbsp;&nbsp; CausalConv | ~8K/block | |
| &nbsp;&nbsp; Monarch | ~131K/block | |
| &nbsp;&nbsp; LongConv | ~65K/block | |
| &nbsp;&nbsp; OrganelleGate | ~769/block | |
| &nbsp;&nbsp; SwiGLU FFN | ~350K/block | |
| &nbsp;&nbsp; RMSNorm (2x) | ~512/block | |
| Final RMSNorm | 256 | <0.1% |

## Results

Trained for 12,305 steps on an NVIDIA RTX 3060 (12GB).

| Metric | Value |
|--------|-------|
| **Val Loss** | **3.62** |
| **Val PPL** | **37.3** |
| Training steps | 12,305 |
| Batch size | 32 |
| Precision | Float16 (AMP) |

### Comparison with Other 5M Julia SLMs

All models trained on the same philosophy corpus with identical tokenizer and training budget (12,305 steps):

| Model | Architecture | Params | Val Loss | Val PPL |
|-------|-------------|--------|----------|---------|
| [JuliaSLM](https://huggingface.co/LisaMegaWatts/JuliaSLM) | Transformer (RoPE) | 5.04M | **3.54** | **34.5** |
| **SymbioSLM** | **Symbiogenesis** | **5.05M** | **3.62** | **37.3** |
| [MonarchSLM](https://huggingface.co/LisaMegaWatts/MonarchSLM) | Monarch Mixer | 5.04M | 3.65 | 38.4 |

SymbioSLM outperforms the Monarch-only baseline while using no attention mechanism. The multi-organelle fusion provides complementary mixing at different scales that a single mixer cannot achieve alone.

## Training Configuration

```toml
[model]
arch = "symbiogenesis"
embed_dim = 256
n_layers = 8
n_monarch_heads = 1
conv_kernel_size = 4
ffn_mult = 4
context_length = 256
weight_tying = true
free_energy_beta = 0.001

[training]
optimizer = "adamw"
lr = 6e-4
min_lr = 6e-5
warmup_steps = 500
max_steps = 12305
batch_size = 32
grad_clip = 1.0
precision = "f16"
```

## Gelation Monitoring

Training includes gelation monitoring via CUSUM change-point detection on gate entropy. This tracks when the organelle gates transition from uniform mixing to specialized configurations — a phase transition analogous to gel formation in polymer physics.

## Usage

### Julia (Lux.jl)

```julia
using JuliaGPT

# Load model
config = load_config("config.toml")
model = create_model(config.model)
ps, st, _, _, _ = load_checkpoint("final.jld2")

# Load tokenizer
tokenizer = BPETokenizer("vocab.json", "merges.txt")

# Generate text
prompt = "The nature of reality"
output = generate(model, ps, st, tokenizer, prompt;
                  max_new_tokens=200, temperature=0.8, top_k=40)
println(output)
```

## References

- **Symbiogenesis framework**: [DavinciDreams/symbiogenesis](https://github.com/DavinciDreams/symbiogenesis) — Evolutionary NAS via organism fusion
- **Monarch Mixer**: Dao et al., 2023 — Sub-quadratic GEMM-based sequence mixing
- **Hyena**: Poli et al., 2023 — Long convolutions for sequence modeling
- **Endosymbiotic theory**: Margulis, 1967 — Origin of eukaryotic organelles

## Citation

```bibtex
@misc{symbio-slm-2026,
  title={SymbioSLM: Multi-Organelle Sequence Mixing for Attention-Free Language Modeling},
  author={LisaMegaWatts},
  year={2026},
  url={https://huggingface.co/LisaMegaWatts/SymbioSLM}
}
```

## License

MIT