SymbioGPT-Gemma-Fused

Cross-species knowledge transfer from a Gemma-270M LoRA adapter (philosophy domain) into the native SymbioGPT-10M architecture via PCA-projected LoRA delta injection.

What This Is

This checkpoint is a SymbioGPT-10M model whose weights have been augmented with projected knowledge from a much larger Gemma-270M model. The Gemma model was fine-tuned on a curated 20MB philosophy corpus using a LoRA adapter (rank 44, alpha 88) evolved by symbiogenesis. The LoRA deltas were then projected across architectures and injected into SymbioGPT's native weights.

Architecture Mapping

The two models have fundamentally different architectures:

Property	Gemma-270M (source)	SymbioGPT-10M (target)
d_model	640	320
Attention	GQA: 16 Q-heads, 4 KV-heads	MHA: 5 heads
Head dim	64	64
FFN dim	2048 (SwiGLU)	832 (SwiGLU)
Layers	18	8
Vocab	262K (Gemma tokenizer)	2K (custom BPE)
Total params	268M	~10M

Projection Method

PCA calibration: Run Gemma on 200 WikiText-103 calibration texts, collect per-layer activations, compute SVD to get projection matrices (640 → 320).
Layer mapping: 18 Gemma layers → 8 SymbioGPT layers via proportional grouping. Deltas from multiple source layers are averaged when mapped to the same target layer.
Attention head mapping (GQA → MHA): Select top-5 Q-heads by LoRA delta L2 norm. K/V heads inherit from their GQA group assignment.
FFN mapping: PCA on the d_model axis (640 → 320), truncation on the FFN axis (2048 → 832).
Delta injection: weight += 0.3 * projected_delta (blend alpha = 0.3).

Results

Metric	Value
PCA avg variance preserved	99.0%
PCA min variance (layer 17)	92.4%
Deltas applied	56 / 56
Deltas skipped	0
Delta/weight ratio range	1.4% - 4.0%
Blend alpha	0.3
Projection time	105s (RTX 3060)

Usage

import torch

# Load the fused checkpoint
checkpoint = torch.load("symbio_gemma_fused.pt", map_location="cpu")
# checkpoint contains the full SymbioGPT state dict with projected LoRA deltas baked in

This is a raw PyTorch state dict for the SymbioGPT architecture. To use it, load it into a SymbioGPT model instance from the symbiogenesis-experiments repo.

Source Models

Base model: LisaMegaWatts/Ouroboros-1MContext-Gemma-270m (Gemma-3 270M with 1M context)
LoRA adapter: LisaMegaWatts/SymbioSLM-ouroboros-lora-20260301 (rank 44, alpha 88, all 7 target modules, evolved by symbiogenesis on philosophy corpus)
Target architecture: SymbioGPT-10M (custom architecture with organelle-gated attention, CausalConv, Monarch mixing, LongConv)

Training Details

The LoRA adapter was evolved using symbiogenesis (population-based LoRA architecture search):

Population: 10 units, 17 generations (early-stopped at gelation)
Fitness: val_loss with complexity penalty (beta=0.01)
Result: PPL 309 → 61 (5x improvement) with 3.89% trainable params
Convergence: All 10 units converged to all-7-target configs with rank ~40-44

Limitations

Not yet evaluated: Perplexity and generation quality on the fused model have not been measured. The projection is mathematically sound (99% PCA variance) but downstream quality is unconfirmed.
Vocab mismatch: Gemma uses a 262K BPE tokenizer; SymbioGPT uses a 2K custom BPE. Embedding weights are not transferred.
Domain-specific: The LoRA was trained on philosophy text. Transfer to other domains is untested.

Model tree for LisaMegaWatts/SymbioGPT-Gemma-Fused

Base model

LisaMegaWatts/Ouroboros-1MContext-Gemma-270m

Finetuned

(2)

this model

LisaMegaWatts
/

SymbioGPT-Gemma-Fused