LisaMegaWatts commited on
Commit
f1ccca7
·
verified ·
1 Parent(s): 63e35b9

Add model card with cross-species projection details

Browse files
Files changed (1) hide show
  1. README.md +98 -0
README.md ADDED
@@ -0,0 +1,98 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - symbiogenesis
7
+ - cross-species
8
+ - lora-projection
9
+ - pca
10
+ - philosophy
11
+ - causal-lm
12
+ base_model:
13
+ - LisaMegaWatts/Ouroboros-1MContext-Gemma-270m
14
+ datasets:
15
+ - wikitext
16
+ pipeline_tag: text-generation
17
+ ---
18
+
19
+ # SymbioGPT-Gemma-Fused
20
+
21
+ Cross-species knowledge transfer from a **Gemma-270M LoRA adapter** (philosophy domain) into the native **SymbioGPT-10M** architecture via PCA-projected LoRA delta injection.
22
+
23
+ ## What This Is
24
+
25
+ This checkpoint is a SymbioGPT-10M model whose weights have been augmented with projected knowledge from a much larger Gemma-270M model. The Gemma model was fine-tuned on a curated 20MB philosophy corpus using a LoRA adapter (rank 44, alpha 88) evolved by [symbiogenesis](https://github.com/DavinciDreams/symbiogenesis). The LoRA deltas were then projected across architectures and injected into SymbioGPT's native weights.
26
+
27
+ ## Architecture Mapping
28
+
29
+ The two models have fundamentally different architectures:
30
+
31
+ | Property | Gemma-270M (source) | SymbioGPT-10M (target) |
32
+ |---|---|---|
33
+ | d_model | 640 | 320 |
34
+ | Attention | GQA: 16 Q-heads, 4 KV-heads | MHA: 5 heads |
35
+ | Head dim | 64 | 64 |
36
+ | FFN dim | 2048 (SwiGLU) | 832 (SwiGLU) |
37
+ | Layers | 18 | 8 |
38
+ | Vocab | 262K (Gemma tokenizer) | 2K (custom BPE) |
39
+ | Total params | 268M | ~10M |
40
+
41
+ ### Projection Method
42
+
43
+ 1. **PCA calibration**: Run Gemma on 200 WikiText-103 calibration texts, collect per-layer activations, compute SVD to get projection matrices (640 → 320).
44
+ 2. **Layer mapping**: 18 Gemma layers → 8 SymbioGPT layers via proportional grouping. Deltas from multiple source layers are averaged when mapped to the same target layer.
45
+ 3. **Attention head mapping (GQA → MHA)**: Select top-5 Q-heads by LoRA delta L2 norm. K/V heads inherit from their GQA group assignment.
46
+ 4. **FFN mapping**: PCA on the d_model axis (640 → 320), truncation on the FFN axis (2048 → 832).
47
+ 5. **Delta injection**: `weight += 0.3 * projected_delta` (blend alpha = 0.3).
48
+
49
+ ## Results
50
+
51
+ | Metric | Value |
52
+ |---|---|
53
+ | PCA avg variance preserved | 99.0% |
54
+ | PCA min variance (layer 17) | 92.4% |
55
+ | Deltas applied | 56 / 56 |
56
+ | Deltas skipped | 0 |
57
+ | Delta/weight ratio range | 1.4% - 4.0% |
58
+ | Blend alpha | 0.3 |
59
+ | Projection time | 105s (RTX 3060) |
60
+
61
+ ## Usage
62
+
63
+ ```python
64
+ import torch
65
+
66
+ # Load the fused checkpoint
67
+ checkpoint = torch.load("symbio_gemma_fused.pt", map_location="cpu")
68
+ # checkpoint contains the full SymbioGPT state dict with projected LoRA deltas baked in
69
+ ```
70
+
71
+ This is a raw PyTorch state dict for the SymbioGPT architecture. To use it, load it into a SymbioGPT model instance from the [symbiogenesis-experiments](https://github.com/DavinciDreams/symbiogenesis) repo.
72
+
73
+ ## Source Models
74
+
75
+ - **Base model**: [LisaMegaWatts/Ouroboros-1MContext-Gemma-270m](https://huggingface.co/LisaMegaWatts/Ouroboros-1MContext-Gemma-270m) (Gemma-3 270M with 1M context)
76
+ - **LoRA adapter**: [LisaMegaWatts/SymbioSLM-ouroboros-lora-20260301](https://huggingface.co/LisaMegaWatts/SymbioSLM-ouroboros-lora-20260301) (rank 44, alpha 88, all 7 target modules, evolved by symbiogenesis on philosophy corpus)
77
+ - **Target architecture**: SymbioGPT-10M (custom architecture with organelle-gated attention, CausalConv, Monarch mixing, LongConv)
78
+
79
+ ## Training Details
80
+
81
+ The LoRA adapter was evolved using symbiogenesis (population-based LoRA architecture search):
82
+ - **Population**: 10 units, 17 generations (early-stopped at gelation)
83
+ - **Fitness**: val_loss with complexity penalty (beta=0.01)
84
+ - **Result**: PPL 309 → 61 (5x improvement) with 3.89% trainable params
85
+ - **Convergence**: All 10 units converged to all-7-target configs with rank ~40-44
86
+
87
+ ## Limitations
88
+
89
+ - **Not yet evaluated**: Perplexity and generation quality on the fused model have not been measured. The projection is mathematically sound (99% PCA variance) but downstream quality is unconfirmed.
90
+ - **Vocab mismatch**: Gemma uses a 262K BPE tokenizer; SymbioGPT uses a 2K custom BPE. Embedding weights are not transferred.
91
+ - **Domain-specific**: The LoRA was trained on philosophy text. Transfer to other domains is untested.
92
+
93
+ ## Links
94
+
95
+ - **W&B run**: [ec6eochs](https://wandb.ai/symbiogenesis/symbiogenesis/runs/ec6eochs)
96
+ - **Framework**: [symbiogenesis](https://github.com/DavinciDreams/symbiogenesis)
97
+ - **Experiments**: [symbiogenesis-experiments](https://github.com/DavinciDreams/symbiogenesis)
98
+ - **Projection script**: `cross_species_lora/project_lora.py` in the experiments repo