Salesforce/wikitext
Viewer • Updated • 3.71M • 1.33M • 690
Full pretrained weight projection from Pythia-160m (768d, 12 layers) into VocabFusion-SymbioGPT-10M (320d, 8 layers) with blend alpha = 0.3.
This is the first cross-species experiment where source and target share both vocabulary (50K GPTNeoX) and head dimension (64), eliminating the two largest information-loss channels from previous projection experiments.
| Component | Pythia-160m (Source) | VocabFusion (Target) | Method |
|---|---|---|---|
| d_model | 768 | 320 | PCA (92.1% variance) |
| Layers | 12 | 8 | Proportional grouping |
| Heads | 12 MHA | 5 MHA | Top-5 by L2 norm |
| Head dim | 64 | 64 | Exact match |
| FFN | 3072 (GELU) | 832 (SwiGLU) | Neuron selection + 2→3 matrix map |
| Vocab | 50,304 | 50,304 (frozen Pythia) | Exact match |
new_W = 0.7 * vocabfusion_W + 0.3 * projected_pythia_W| Aspect | Gemma → SymbioGPT | Pythia → VocabFusion |
|---|---|---|
| Vocabulary | Mismatched (256K→2K) | Shared (50K=50K) |
| Head dim | 64=64 | 64=64 |
| Attention type | GQA→MHA (complex) | MHA→MHA (simple) |
| Transfer type | LoRA deltas only | Full pretrained weights |
| PCA ratio | 2:1 (640→320) | 2.4:1 (768→320) |
import torch
checkpoint = torch.load("vocabfusion_pythia160m_fused.pt", map_location="cpu")
Load into a VocabFusionModel instance from the
symbiogenesis-experiments
repo (vocab_fusion_experiment/model.py).
cross_species_lora/project_pythia_to_vocabfusion.pyBase model
EleutherAI/pythia-160m