SymbioGPT-Gemma-Fused
Cross-species knowledge transfer from a Gemma-270M LoRA adapter (philosophy domain) into the native SymbioGPT-10M architecture via PCA-projected LoRA delta injection.
What This Is
This checkpoint is a SymbioGPT-10M model whose weights have been augmented with projected knowledge from a much larger Gemma-270M model. The Gemma model was fine-tuned on a curated 20MB philosophy corpus using a LoRA adapter (rank 44, alpha 88) evolved by symbiogenesis. The LoRA deltas were then projected across architectures and injected into SymbioGPT's native weights.
Architecture Mapping
The two models have fundamentally different architectures:
| Property | Gemma-270M (source) | SymbioGPT-10M (target) |
|---|---|---|
| d_model | 640 | 320 |
| Attention | GQA: 16 Q-heads, 4 KV-heads | MHA: 5 heads |
| Head dim | 64 | 64 |
| FFN dim | 2048 (SwiGLU) | 832 (SwiGLU) |
| Layers | 18 | 8 |
| Vocab | 262K (Gemma tokenizer) | 2K (custom BPE) |
| Total params | 268M | ~10M |
Projection Method
- PCA calibration: Run Gemma on 200 WikiText-103 calibration texts, collect per-layer activations, compute SVD to get projection matrices (640 → 320).
- Layer mapping: 18 Gemma layers → 8 SymbioGPT layers via proportional grouping. Deltas from multiple source layers are averaged when mapped to the same target layer.
- Attention head mapping (GQA → MHA): Select top-5 Q-heads by LoRA delta L2 norm. K/V heads inherit from their GQA group assignment.
- FFN mapping: PCA on the d_model axis (640 → 320), truncation on the FFN axis (2048 → 832).
- Delta injection:
weight += 0.3 * projected_delta(blend alpha = 0.3).
Results
| Metric | Value |
|---|---|
| PCA avg variance preserved | 99.0% |
| PCA min variance (layer 17) | 92.4% |
| Deltas applied | 56 / 56 |
| Deltas skipped | 0 |
| Delta/weight ratio range | 1.4% - 4.0% |
| Blend alpha | 0.3 |
| Projection time | 105s (RTX 3060) |
Usage
import torch
# Load the fused checkpoint
checkpoint = torch.load("symbio_gemma_fused.pt", map_location="cpu")
# checkpoint contains the full SymbioGPT state dict with projected LoRA deltas baked in
This is a raw PyTorch state dict for the SymbioGPT architecture. To use it, load it into a SymbioGPT model instance from the symbiogenesis-experiments repo.
Source Models
- Base model: LisaMegaWatts/Ouroboros-1MContext-Gemma-270m (Gemma-3 270M with 1M context)
- LoRA adapter: LisaMegaWatts/SymbioSLM-ouroboros-lora-20260301 (rank 44, alpha 88, all 7 target modules, evolved by symbiogenesis on philosophy corpus)
- Target architecture: SymbioGPT-10M (custom architecture with organelle-gated attention, CausalConv, Monarch mixing, LongConv)
Training Details
The LoRA adapter was evolved using symbiogenesis (population-based LoRA architecture search):
- Population: 10 units, 17 generations (early-stopped at gelation)
- Fitness: val_loss with complexity penalty (beta=0.01)
- Result: PPL 309 → 61 (5x improvement) with 3.89% trainable params
- Convergence: All 10 units converged to all-7-target configs with rank ~40-44
Limitations
- Not yet evaluated: Perplexity and generation quality on the fused model have not been measured. The projection is mathematically sound (99% PCA variance) but downstream quality is unconfirmed.
- Vocab mismatch: Gemma uses a 262K BPE tokenizer; SymbioGPT uses a 2K custom BPE. Embedding weights are not transferred.
- Domain-specific: The LoRA was trained on philosophy text. Transfer to other domains is untested.
Links
- W&B run: ec6eochs
- Framework: symbiogenesis
- Experiments: symbiogenesis-experiments
- Projection script:
cross_species_lora/project_lora.pyin the experiments repo
Model tree for LisaMegaWatts/SymbioGPT-Gemma-Fused
Base model
google/gemma-3-270m