SymbioSLM Grammar Expert LoRA
A grammar-specialist LoRA adapter for SymbioSLM (~4.3M params), trained on CoLA (Corpus of Linguistic Acceptability) via symbiogenesis evolution. This is an attention-free model β all sequence mixing uses sub-quadratic organelles (CausalConv, Monarch matrices, LongConv).
Since SymbioSLM has no PyTorch checkpoint (it's Julia-native), this experiment trained with the full base model unfrozen alongside LoRA, testing whether the attention-free architecture can learn grammar from scratch.
Key Results
| Metric | At Gelation (gen 6) | Final (gen 24) |
|---|---|---|
| Train accuracy | 80.4% | 80.4% |
| Test accuracy | 44.6% | 60.6% |
| Overfit gap | 35.8pp | 19.8pp |
| Metric | Value |
|---|---|
| Random baseline (majority class) | 64.1% |
| Base perplexity | 2045.6 |
| With LoRA perplexity | 2051.0 (+0.3%) |
| Grammar sense improvement | +0.009 (log-prob ratio) |
| Gelation (convergence) | Generation 6 |
| LoRA params | 2,468,116 (57.9% of base β unfrozen) |
Grammar Sense Signal
The LoRA-adapted model assigns relatively higher probability to grammatical sentences:
Base With LoRA
Acceptable log-prob: -7.619 -7.617
Unacceptable log-prob: -7.625 -7.632
Ratio (higher=better): 0.006 0.015 (+150% relative)
This is a small but directionally correct signal from a random-init 4M attention-free model.
Architecture
SymbioSLM is a 3-organelle decoder-only language model with NO attention:
- CausalDepthwiseConv1d β local n-gram pattern detection
- MonarchMatrix (8 heads) β sub-quadratic global mixing via butterfly factorization
- LongConv β dense causal convolution for medium-range dependencies
- OrganelleGate β learned per-channel blend across organelles
SymbioSLM: d_model=256, n_layers=6, n_monarch_heads=8, vocab_size=2000
Total params: 4,261,650
The attention-free design means LoRA can only target SwiGLU layers (w1, v, w2), giving 3 target types Γ 6 blocks = 18 possible injection points β far fewer than attention-equipped models.
LoRA Configuration
Manual LoRA injection (not PEFT) into SwiGLU feed-forward layers:
| Target | Layer Type | Per Block |
|---|---|---|
| w1 | SwiGLU gate projection | 256β512 |
| w2 | SwiGLU output projection | 512β256 |
Best evolved config: rank=16, alpha=32.0, targets=(w1, w2)
Evolution consistently converged on the gate+output pair (w1, w2), preferring this over configurations that include the value projection (v).
Evolution Details
- Population: 8 random LoRAUnit configs
- Training: 200 steps per unit, lr=2e-4, batch=16, base unfrozen (no pre-trained checkpoint)
- Fitness:
accuracy - 0.01 Γ log(n_trainable) - Gelation: CUSUM change-point at generation 6 (CUSUM=4.10)
- Post-gelation: Architecture locked (r=16, w1+w2) but test accuracy continued improving
Test Accuracy Over Time
Gen 0: 40.4%
Gen 5: 54.0% (pre-gelation)
Gen 6: 40.0% (at gelation)
Gen 10: 61.2%
Gen 15: 57.4%
Gen 20: 56.0%
Gen 24: 60.6% (final)
Test accuracy oscillated but trended upward, suggesting continued evolution post-gelation was beneficial for this model. Gelation marked architecture convergence, not a generalization peak.
Usage
Requires the SymbioSLM model architecture. See the training notebook for the full model definition.
import torch
from huggingface_hub import hf_hub_download
# Load LoRA weights
weights_path = hf_hub_download(
"LisaMegaWatts/SymbioSLM-GrammarExpert-20260301",
"lora_state.pt"
)
lora_state = torch.load(weights_path, map_location="cpu")
# Inject into SymbioSLM base model
# inject_lora(model, target_modules=['w1', 'w2'], rank=16, alpha=32.0)
# load_lora_state(model, lora_state)
Files
| File | Description |
|---|---|
lora_state.pt |
LoRA A/B parameter state dict (696 KB) |
experiment_config.json |
Full experiment config and results |
Part of Symbiogenesis
This is part of a 3-model grammar expert comparison:
| Model | Params | Attention | CoLA Test Acc | Status |
|---|---|---|---|---|
| Ouroboros (Gemma 270M) | 270M | Yes (standard) | Pending | Notebook ready |
| SymbioGPT-10M | 10M | Yes (+ organelles) | 53.2% | Complete |
| SymbioSLM ~4M (this) | 4.3M | No | 60.6% | Complete |
W&B run: grammar-expert-symbioslm
GitHub: DavinciDreams/SymbioGPT
Model tree for LisaMegaWatts/SymbioSLM-GrammarExpert-20260301
Base model
LisaMegaWatts/SymbioSLM