SymbioSLM-ouroboros-lora β Evolved LoRA Adapter for Gemma-3-270M
A LoRA adapter discovered by symbiogenesis β a population-based evolutionary framework that evolves adapter configurations (rank, target modules) through fusion and selection, with CUSUM gelation detection for automatic stopping.
Key Results
| Metric | Baseline (frozen) | This Adapter |
|---|---|---|
| Val Loss | 5.7342 | 4.1181 |
| Perplexity | 309.3 | 61.4 |
| Trainable Params | 0 | 10,441,728 (3.89%) |
5x perplexity improvement on curated philosophy text with under 4% trainable parameters.
Adapter Configuration (Evolved)
| Parameter | Value | How Discovered |
|---|---|---|
| Rank | 44 | Parallel fusion (gen 1) |
| Target modules | All 7 (q, k, v, o, gate, up, down) | 100% population convergence |
| Alpha | 88 (2 Γ rank) | Fixed rule |
| Dropout | 0.0 | Evolution selected |
The population of 10 adapters converged to all-7-target configs by generation 7 (gelation). MLP modules (gate_proj, up_proj, down_proj) reached 100% adoption β essential for causal LM adaptation on Gemma-3.
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"LisaMegaWatts/Ouroboros-1MContext-Gemma-270m",
torch_dtype="auto",
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "LisaMegaWatts/SymbioSLM-ouroboros-lora-20260301")
tokenizer = AutoTokenizer.from_pretrained("LisaMegaWatts/Ouroboros-1MContext-Gemma-270m")
inputs = tokenizer("The nature of consciousness", return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training Details
Method: Symbiogenesis
Symbiogenesis (Margulis, 1967) models complexity emerging through fusion of simpler organisms. Here, small LoRA adapters are the "organisms" β they fuse configurations (merging target modules and ranks) and compete on fitness.
Evolution:
- Population: 10 random LoRA adapters
- Generations: 17 (early stopped after gelation at gen 7 + 10 patience)
- Fusion: Hybrid (sequential = union targets + avg rank; parallel = union targets + sum ranks)
- Selection: Tournament (k=3)
- Fitness: -(val_loss + 0.01 Γ log(n_params))
Extended fine-tune: Best config trained 2000 steps with cosine LR schedule.
Data
Curated philosophy corpus from LisaMegaWatts/SymbioGPT-10M:
- Train: 20MB raw text β 4.3M tokens (8,383 sequences Γ 512 context)
- Val: 2MB raw text β 467K tokens (910 sequences Γ 512 context)
- Tokenizer: Gemma (262,145 vocab)
Hyperparameters
- Learning rate: 2e-4 (cosine decay, warmup=100)
- Batch size: 2 (gradient accumulation 4, effective batch 8)
- Precision: bfloat16
- Optimizer: AdamW (weight_decay=0.01)
- Gradient clipping: 1.0
Compute
| Phase | Time | Hardware |
|---|---|---|
| Evolution (10 pop Γ 17 gens) | 160 min | RTX 3060 12GB |
| Extended fine-tune (2000 steps) | 26 min | RTX 3060 12GB |
| Total | 186 min |
W&B Runs
Evolution Details
Target Module Convergence
| Module | Final Frequency | Role |
|---|---|---|
| v_proj | 100% | Value projection |
| o_proj | 100% | Output projection |
| up_proj | 100% | MLP up-projection |
| down_proj | 100% | MLP down-projection |
| q_proj | 90% | Query projection |
| k_proj | 90% | Key projection |
| gate_proj | 90% | MLP gate |
Extended Fine-Tune Curve
| Step | Val Loss | PPL |
|---|---|---|
| 250 | 4.1890 | 66.0 |
| 500 | 4.1727 | 64.9 |
| 750 | 4.1445 | 63.1 |
| 1000 | 4.1416 | 62.9 |
| 1250 | 4.1259 | 61.9 |
| Final | 4.1181 | 61.4 |
Links
- Base model: Ouroboros-1MContext-Gemma-270m
- Training corpus: SymbioGPT-10M
- Code: DavinciDreams/SymbioGPT
- Experiment write-up: Causal_LM_Gemma270M_Results.md
- Framework: Symbiogenesis
Framework Versions
- PEFT 0.18.1
- Transformers 5.0.0
- PyTorch 2.10.0
- Downloads last month
- -
Model tree for LisaMegaWatts/SymbioSLM-ouroboros-lora-20260301
Base model
google/gemma-3-270m