Add model card with cross-species projection details
Browse files
README.md
ADDED
|
@@ -0,0 +1,98 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
tags:
|
| 6 |
+
- symbiogenesis
|
| 7 |
+
- cross-species
|
| 8 |
+
- lora-projection
|
| 9 |
+
- pca
|
| 10 |
+
- philosophy
|
| 11 |
+
- causal-lm
|
| 12 |
+
base_model:
|
| 13 |
+
- LisaMegaWatts/Ouroboros-1MContext-Gemma-270m
|
| 14 |
+
datasets:
|
| 15 |
+
- wikitext
|
| 16 |
+
pipeline_tag: text-generation
|
| 17 |
+
---
|
| 18 |
+
|
| 19 |
+
# SymbioGPT-Gemma-Fused
|
| 20 |
+
|
| 21 |
+
Cross-species knowledge transfer from a **Gemma-270M LoRA adapter** (philosophy domain) into the native **SymbioGPT-10M** architecture via PCA-projected LoRA delta injection.
|
| 22 |
+
|
| 23 |
+
## What This Is
|
| 24 |
+
|
| 25 |
+
This checkpoint is a SymbioGPT-10M model whose weights have been augmented with projected knowledge from a much larger Gemma-270M model. The Gemma model was fine-tuned on a curated 20MB philosophy corpus using a LoRA adapter (rank 44, alpha 88) evolved by [symbiogenesis](https://github.com/DavinciDreams/symbiogenesis). The LoRA deltas were then projected across architectures and injected into SymbioGPT's native weights.
|
| 26 |
+
|
| 27 |
+
## Architecture Mapping
|
| 28 |
+
|
| 29 |
+
The two models have fundamentally different architectures:
|
| 30 |
+
|
| 31 |
+
| Property | Gemma-270M (source) | SymbioGPT-10M (target) |
|
| 32 |
+
|---|---|---|
|
| 33 |
+
| d_model | 640 | 320 |
|
| 34 |
+
| Attention | GQA: 16 Q-heads, 4 KV-heads | MHA: 5 heads |
|
| 35 |
+
| Head dim | 64 | 64 |
|
| 36 |
+
| FFN dim | 2048 (SwiGLU) | 832 (SwiGLU) |
|
| 37 |
+
| Layers | 18 | 8 |
|
| 38 |
+
| Vocab | 262K (Gemma tokenizer) | 2K (custom BPE) |
|
| 39 |
+
| Total params | 268M | ~10M |
|
| 40 |
+
|
| 41 |
+
### Projection Method
|
| 42 |
+
|
| 43 |
+
1. **PCA calibration**: Run Gemma on 200 WikiText-103 calibration texts, collect per-layer activations, compute SVD to get projection matrices (640 → 320).
|
| 44 |
+
2. **Layer mapping**: 18 Gemma layers → 8 SymbioGPT layers via proportional grouping. Deltas from multiple source layers are averaged when mapped to the same target layer.
|
| 45 |
+
3. **Attention head mapping (GQA → MHA)**: Select top-5 Q-heads by LoRA delta L2 norm. K/V heads inherit from their GQA group assignment.
|
| 46 |
+
4. **FFN mapping**: PCA on the d_model axis (640 → 320), truncation on the FFN axis (2048 → 832).
|
| 47 |
+
5. **Delta injection**: `weight += 0.3 * projected_delta` (blend alpha = 0.3).
|
| 48 |
+
|
| 49 |
+
## Results
|
| 50 |
+
|
| 51 |
+
| Metric | Value |
|
| 52 |
+
|---|---|
|
| 53 |
+
| PCA avg variance preserved | 99.0% |
|
| 54 |
+
| PCA min variance (layer 17) | 92.4% |
|
| 55 |
+
| Deltas applied | 56 / 56 |
|
| 56 |
+
| Deltas skipped | 0 |
|
| 57 |
+
| Delta/weight ratio range | 1.4% - 4.0% |
|
| 58 |
+
| Blend alpha | 0.3 |
|
| 59 |
+
| Projection time | 105s (RTX 3060) |
|
| 60 |
+
|
| 61 |
+
## Usage
|
| 62 |
+
|
| 63 |
+
```python
|
| 64 |
+
import torch
|
| 65 |
+
|
| 66 |
+
# Load the fused checkpoint
|
| 67 |
+
checkpoint = torch.load("symbio_gemma_fused.pt", map_location="cpu")
|
| 68 |
+
# checkpoint contains the full SymbioGPT state dict with projected LoRA deltas baked in
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
This is a raw PyTorch state dict for the SymbioGPT architecture. To use it, load it into a SymbioGPT model instance from the [symbiogenesis-experiments](https://github.com/DavinciDreams/symbiogenesis) repo.
|
| 72 |
+
|
| 73 |
+
## Source Models
|
| 74 |
+
|
| 75 |
+
- **Base model**: [LisaMegaWatts/Ouroboros-1MContext-Gemma-270m](https://huggingface.co/LisaMegaWatts/Ouroboros-1MContext-Gemma-270m) (Gemma-3 270M with 1M context)
|
| 76 |
+
- **LoRA adapter**: [LisaMegaWatts/SymbioSLM-ouroboros-lora-20260301](https://huggingface.co/LisaMegaWatts/SymbioSLM-ouroboros-lora-20260301) (rank 44, alpha 88, all 7 target modules, evolved by symbiogenesis on philosophy corpus)
|
| 77 |
+
- **Target architecture**: SymbioGPT-10M (custom architecture with organelle-gated attention, CausalConv, Monarch mixing, LongConv)
|
| 78 |
+
|
| 79 |
+
## Training Details
|
| 80 |
+
|
| 81 |
+
The LoRA adapter was evolved using symbiogenesis (population-based LoRA architecture search):
|
| 82 |
+
- **Population**: 10 units, 17 generations (early-stopped at gelation)
|
| 83 |
+
- **Fitness**: val_loss with complexity penalty (beta=0.01)
|
| 84 |
+
- **Result**: PPL 309 → 61 (5x improvement) with 3.89% trainable params
|
| 85 |
+
- **Convergence**: All 10 units converged to all-7-target configs with rank ~40-44
|
| 86 |
+
|
| 87 |
+
## Limitations
|
| 88 |
+
|
| 89 |
+
- **Not yet evaluated**: Perplexity and generation quality on the fused model have not been measured. The projection is mathematically sound (99% PCA variance) but downstream quality is unconfirmed.
|
| 90 |
+
- **Vocab mismatch**: Gemma uses a 262K BPE tokenizer; SymbioGPT uses a 2K custom BPE. Embedding weights are not transferred.
|
| 91 |
+
- **Domain-specific**: The LoRA was trained on philosophy text. Transfer to other domains is untested.
|
| 92 |
+
|
| 93 |
+
## Links
|
| 94 |
+
|
| 95 |
+
- **W&B run**: [ec6eochs](https://wandb.ai/symbiogenesis/symbiogenesis/runs/ec6eochs)
|
| 96 |
+
- **Framework**: [symbiogenesis](https://github.com/DavinciDreams/symbiogenesis)
|
| 97 |
+
- **Experiments**: [symbiogenesis-experiments](https://github.com/DavinciDreams/symbiogenesis)
|
| 98 |
+
- **Projection script**: `cross_species_lora/project_lora.py` in the experiments repo
|