VocabFusion-Pythia160m-Fused
Full pretrained weight projection from Pythia-160m (768d, 12 layers) into VocabFusion-SymbioGPT-10M (320d, 8 layers) with blend alpha = 0.3.
This is the first cross-species experiment where source and target share both vocabulary (50K GPTNeoX) and head dimension (64), eliminating the two largest information-loss channels from previous projection experiments.
Architecture Mapping
| Component | Pythia-160m (Source) | VocabFusion (Target) | Method |
|---|---|---|---|
| d_model | 768 | 320 | PCA (92.1% variance) |
| Layers | 12 | 8 | Proportional grouping |
| Heads | 12 MHA | 5 MHA | Top-5 by L2 norm |
| Head dim | 64 | 64 | Exact match |
| FFN | 3072 (GELU) | 832 (SwiGLU) | Neuron selection + 2→3 matrix map |
| Vocab | 50,304 | 50,304 (frozen Pythia) | Exact match |
Projection Details
- 56 weights projected (8 layers × 7 weights: wq, wk, wv, wo, w1, v, w2)
- PCA avg variance: 92.1% (range: 82.6% — 97.3%)
- Blend formula:
new_W = 0.7 * vocabfusion_W + 0.3 * projected_pythia_W - Zero fine-tuning: No training after projection
- What's NOT projected: Embeddings (shared), junction layer, organelle weights (CausalConv, Monarch, LongConv), OrganelleGate, norms, skip gates
Key Advantages Over Gemma→SymbioGPT Experiment
| Aspect | Gemma → SymbioGPT | Pythia → VocabFusion |
|---|---|---|
| Vocabulary | Mismatched (256K→2K) | Shared (50K=50K) |
| Head dim | 64=64 | 64=64 |
| Attention type | GQA→MHA (complex) | MHA→MHA (simple) |
| Transfer type | LoRA deltas only | Full pretrained weights |
| PCA ratio | 2:1 (640→320) | 2.4:1 (768→320) |
Usage
import torch
checkpoint = torch.load("vocabfusion_pythia160m_fused.pt", map_location="cpu")
Load into a VocabFusionModel instance from the
symbiogenesis-experiments
repo (vocab_fusion_experiment/model.py).
Links
- VocabFusion original: LisaMegaWatts/VocabFusion-SymbioGPT-10M
- Source Pythia: EleutherAI/pythia-160m
- Gemma LoRA experiment: LisaMegaWatts/SymbioGPT-Gemma-Fused
- Framework: symbiogenesis
- Projection script:
cross_species_lora/project_pythia_to_vocabfusion.py
Model tree for LisaMegaWatts/VocabFusion-Pythia160m-Fused
Base model
EleutherAI/pythia-160m