SymbioGPT-Gemma-Fused

Cross-species knowledge transfer from a Gemma-270M LoRA adapter (philosophy domain) into the native SymbioGPT-10M architecture via PCA-projected LoRA delta injection.

What This Is

This checkpoint is a SymbioGPT-10M model whose weights have been augmented with projected knowledge from a much larger Gemma-270M model. The Gemma model was fine-tuned on a curated 20MB philosophy corpus using a LoRA adapter (rank 44, alpha 88) evolved by symbiogenesis. The LoRA deltas were then projected across architectures and injected into SymbioGPT's native weights.

Architecture Mapping

The two models have fundamentally different architectures:

Property Gemma-270M (source) SymbioGPT-10M (target)
d_model 640 320
Attention GQA: 16 Q-heads, 4 KV-heads MHA: 5 heads
Head dim 64 64
FFN dim 2048 (SwiGLU) 832 (SwiGLU)
Layers 18 8
Vocab 262K (Gemma tokenizer) 2K (custom BPE)
Total params 268M ~10M

Projection Method

  1. PCA calibration: Run Gemma on 200 WikiText-103 calibration texts, collect per-layer activations, compute SVD to get projection matrices (640 → 320).
  2. Layer mapping: 18 Gemma layers → 8 SymbioGPT layers via proportional grouping. Deltas from multiple source layers are averaged when mapped to the same target layer.
  3. Attention head mapping (GQA → MHA): Select top-5 Q-heads by LoRA delta L2 norm. K/V heads inherit from their GQA group assignment.
  4. FFN mapping: PCA on the d_model axis (640 → 320), truncation on the FFN axis (2048 → 832).
  5. Delta injection: weight += 0.3 * projected_delta (blend alpha = 0.3).

Results

Metric Value
PCA avg variance preserved 99.0%
PCA min variance (layer 17) 92.4%
Deltas applied 56 / 56
Deltas skipped 0
Delta/weight ratio range 1.4% - 4.0%
Blend alpha 0.3
Projection time 105s (RTX 3060)

Usage

import torch

# Load the fused checkpoint
checkpoint = torch.load("symbio_gemma_fused.pt", map_location="cpu")
# checkpoint contains the full SymbioGPT state dict with projected LoRA deltas baked in

This is a raw PyTorch state dict for the SymbioGPT architecture. To use it, load it into a SymbioGPT model instance from the symbiogenesis-experiments repo.

Source Models

Training Details

The LoRA adapter was evolved using symbiogenesis (population-based LoRA architecture search):

  • Population: 10 units, 17 generations (early-stopped at gelation)
  • Fitness: val_loss with complexity penalty (beta=0.01)
  • Result: PPL 309 → 61 (5x improvement) with 3.89% trainable params
  • Convergence: All 10 units converged to all-7-target configs with rank ~40-44

Limitations

  • Not yet evaluated: Perplexity and generation quality on the fused model have not been measured. The projection is mathematically sound (99% PCA variance) but downstream quality is unconfirmed.
  • Vocab mismatch: Gemma uses a 262K BPE tokenizer; SymbioGPT uses a 2K custom BPE. Embedding weights are not transferred.
  • Domain-specific: The LoRA was trained on philosophy text. Transfer to other domains is untested.

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LisaMegaWatts/SymbioGPT-Gemma-Fused

Finetuned
(2)
this model

Dataset used to train LisaMegaWatts/SymbioGPT-Gemma-Fused

Space using LisaMegaWatts/SymbioGPT-Gemma-Fused 1