SymbioSLM Grammar Expert LoRA

A grammar-specialist LoRA adapter for SymbioSLM (~4.3M params), trained on CoLA (Corpus of Linguistic Acceptability) via symbiogenesis evolution. This is an attention-free model β€” all sequence mixing uses sub-quadratic organelles (CausalConv, Monarch matrices, LongConv).

Since SymbioSLM has no PyTorch checkpoint (it's Julia-native), this experiment trained with the full base model unfrozen alongside LoRA, testing whether the attention-free architecture can learn grammar from scratch.

Key Results

Metric At Gelation (gen 6) Final (gen 24)
Train accuracy 80.4% 80.4%
Test accuracy 44.6% 60.6%
Overfit gap 35.8pp 19.8pp
Metric Value
Random baseline (majority class) 64.1%
Base perplexity 2045.6
With LoRA perplexity 2051.0 (+0.3%)
Grammar sense improvement +0.009 (log-prob ratio)
Gelation (convergence) Generation 6
LoRA params 2,468,116 (57.9% of base β€” unfrozen)

Grammar Sense Signal

The LoRA-adapted model assigns relatively higher probability to grammatical sentences:

                          Base      With LoRA
Acceptable log-prob:     -7.619     -7.617
Unacceptable log-prob:   -7.625     -7.632
Ratio (higher=better):   0.006      0.015  (+150% relative)

This is a small but directionally correct signal from a random-init 4M attention-free model.

Architecture

SymbioSLM is a 3-organelle decoder-only language model with NO attention:

  • CausalDepthwiseConv1d β€” local n-gram pattern detection
  • MonarchMatrix (8 heads) β€” sub-quadratic global mixing via butterfly factorization
  • LongConv β€” dense causal convolution for medium-range dependencies
  • OrganelleGate β€” learned per-channel blend across organelles
SymbioSLM: d_model=256, n_layers=6, n_monarch_heads=8, vocab_size=2000
Total params: 4,261,650

The attention-free design means LoRA can only target SwiGLU layers (w1, v, w2), giving 3 target types Γ— 6 blocks = 18 possible injection points β€” far fewer than attention-equipped models.

LoRA Configuration

Manual LoRA injection (not PEFT) into SwiGLU feed-forward layers:

Target Layer Type Per Block
w1 SwiGLU gate projection 256β†’512
w2 SwiGLU output projection 512β†’256

Best evolved config: rank=16, alpha=32.0, targets=(w1, w2)

Evolution consistently converged on the gate+output pair (w1, w2), preferring this over configurations that include the value projection (v).

Evolution Details

  1. Population: 8 random LoRAUnit configs
  2. Training: 200 steps per unit, lr=2e-4, batch=16, base unfrozen (no pre-trained checkpoint)
  3. Fitness: accuracy - 0.01 Γ— log(n_trainable)
  4. Gelation: CUSUM change-point at generation 6 (CUSUM=4.10)
  5. Post-gelation: Architecture locked (r=16, w1+w2) but test accuracy continued improving

Test Accuracy Over Time

Gen  0: 40.4%
Gen  5: 54.0%  (pre-gelation)
Gen  6: 40.0%  (at gelation)
Gen 10: 61.2%
Gen 15: 57.4%
Gen 20: 56.0%
Gen 24: 60.6%  (final)

Test accuracy oscillated but trended upward, suggesting continued evolution post-gelation was beneficial for this model. Gelation marked architecture convergence, not a generalization peak.

Usage

Requires the SymbioSLM model architecture. See the training notebook for the full model definition.

import torch
from huggingface_hub import hf_hub_download

# Load LoRA weights
weights_path = hf_hub_download(
    "LisaMegaWatts/SymbioSLM-GrammarExpert-20260301",
    "lora_state.pt"
)
lora_state = torch.load(weights_path, map_location="cpu")

# Inject into SymbioSLM base model
# inject_lora(model, target_modules=['w1', 'w2'], rank=16, alpha=32.0)
# load_lora_state(model, lora_state)

Files

File Description
lora_state.pt LoRA A/B parameter state dict (696 KB)
experiment_config.json Full experiment config and results

Part of Symbiogenesis

This is part of a 3-model grammar expert comparison:

Model Params Attention CoLA Test Acc Status
Ouroboros (Gemma 270M) 270M Yes (standard) Pending Notebook ready
SymbioGPT-10M 10M Yes (+ organelles) 53.2% Complete
SymbioSLM ~4M (this) 4.3M No 60.6% Complete

W&B run: grammar-expert-symbioslm

GitHub: DavinciDreams/SymbioGPT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for LisaMegaWatts/SymbioSLM-GrammarExpert-20260301

Adapter
(1)
this model

Dataset used to train LisaMegaWatts/SymbioSLM-GrammarExpert-20260301