SymbioSLM Grammar Expert LoRA

A grammar-specialist LoRA adapter for SymbioSLM (~4.3M params), trained on CoLA (Corpus of Linguistic Acceptability) via symbiogenesis evolution. This is an attention-free model — all sequence mixing uses sub-quadratic organelles (CausalConv, Monarch matrices, LongConv).

Since SymbioSLM has no PyTorch checkpoint (it's Julia-native), this experiment trained with the full base model unfrozen alongside LoRA, testing whether the attention-free architecture can learn grammar from scratch.

Key Results

Metric	At Gelation (gen 6)	Final (gen 24)
Train accuracy	80.4%	80.4%
Test accuracy	44.6%	60.6%
Overfit gap	35.8pp	19.8pp

Metric	Value
Random baseline (majority class)	64.1%
Base perplexity	2045.6
With LoRA perplexity	2051.0 (+0.3%)
Grammar sense improvement	+0.009 (log-prob ratio)
Gelation (convergence)	Generation 6
LoRA params	2,468,116 (57.9% of base — unfrozen)

Grammar Sense Signal

The LoRA-adapted model assigns relatively higher probability to grammatical sentences:

                          Base      With LoRA
Acceptable log-prob:     -7.619     -7.617
Unacceptable log-prob:   -7.625     -7.632
Ratio (higher=better):   0.006      0.015  (+150% relative)

This is a small but directionally correct signal from a random-init 4M attention-free model.

Architecture

SymbioSLM is a 3-organelle decoder-only language model with NO attention:

CausalDepthwiseConv1d — local n-gram pattern detection
MonarchMatrix (8 heads) — sub-quadratic global mixing via butterfly factorization
LongConv — dense causal convolution for medium-range dependencies
OrganelleGate — learned per-channel blend across organelles

SymbioSLM: d_model=256, n_layers=6, n_monarch_heads=8, vocab_size=2000
Total params: 4,261,650

The attention-free design means LoRA can only target SwiGLU layers (w1, v, w2), giving 3 target types × 6 blocks = 18 possible injection points — far fewer than attention-equipped models.

LoRA Configuration

Manual LoRA injection (not PEFT) into SwiGLU feed-forward layers:

Target	Layer Type	Per Block
w1	SwiGLU gate projection	256→512
w2	SwiGLU output projection	512→256

Best evolved config: rank=16, alpha=32.0, targets=(w1, w2)

Evolution consistently converged on the gate+output pair (w1, w2), preferring this over configurations that include the value projection (v).

Evolution Details

Population: 8 random LoRAUnit configs
Training: 200 steps per unit, lr=2e-4, batch=16, base unfrozen (no pre-trained checkpoint)
Fitness: accuracy - 0.01 × log(n_trainable)
Gelation: CUSUM change-point at generation 6 (CUSUM=4.10)
Post-gelation: Architecture locked (r=16, w1+w2) but test accuracy continued improving

Test Accuracy Over Time

Gen  0: 40.4%
Gen  5: 54.0%  (pre-gelation)
Gen  6: 40.0%  (at gelation)
Gen 10: 61.2%
Gen 15: 57.4%
Gen 20: 56.0%
Gen 24: 60.6%  (final)

Test accuracy oscillated but trended upward, suggesting continued evolution post-gelation was beneficial for this model. Gelation marked architecture convergence, not a generalization peak.

Usage

Requires the SymbioSLM model architecture. See the training notebook for the full model definition.

import torch
from huggingface_hub import hf_hub_download

# Load LoRA weights
weights_path = hf_hub_download(
    "LisaMegaWatts/SymbioSLM-GrammarExpert-20260301",
    "lora_state.pt"
)
lora_state = torch.load(weights_path, map_location="cpu")

# Inject into SymbioSLM base model
# inject_lora(model, target_modules=['w1', 'w2'], rank=16, alpha=32.0)
# load_lora_state(model, lora_state)

Files

File	Description
`lora_state.pt`	LoRA A/B parameter state dict (696 KB)
`experiment_config.json`	Full experiment config and results

Part of Symbiogenesis

This is part of a 3-model grammar expert comparison:

Model	Params	Attention	CoLA Test Acc	Status
Ouroboros (Gemma 270M)	270M	Yes (standard)	Pending	Notebook ready
SymbioGPT-10M	10M	Yes (+ organelles)	53.2%	Complete
SymbioSLM ~4M (this)	4.3M	No	60.6%	Complete

W&B run: grammar-expert-symbioslm

GitHub: DavinciDreams/SymbioGPT

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for LisaMegaWatts/SymbioSLM-GrammarExpert-20260301

Base model

LisaMegaWatts/SymbioSLM

Adapter

(1)

this model

LisaMegaWatts
/

SymbioSLM-GrammarExpert-20260301