VocabFusion-Pythia160m-Fused

Full pretrained weight projection from Pythia-160m (768d, 12 layers) into VocabFusion-SymbioGPT-10M (320d, 8 layers) with blend alpha = 0.3.

This is the first cross-species experiment where source and target share both vocabulary (50K GPTNeoX) and head dimension (64), eliminating the two largest information-loss channels from previous projection experiments.

Architecture Mapping

Component	Pythia-160m (Source)	VocabFusion (Target)	Method
d_model	768	320	PCA (92.1% variance)
Layers	12	8	Proportional grouping
Heads	12 MHA	5 MHA	Top-5 by L2 norm
Head dim	64	64	Exact match
FFN	3072 (GELU)	832 (SwiGLU)	Neuron selection + 2→3 matrix map
Vocab	50,304	50,304 (frozen Pythia)	Exact match

Projection Details

56 weights projected (8 layers × 7 weights: wq, wk, wv, wo, w1, v, w2)
PCA avg variance: 92.1% (range: 82.6% — 97.3%)
Blend formula: new_W = 0.7 * vocabfusion_W + 0.3 * projected_pythia_W
Zero fine-tuning: No training after projection
What's NOT projected: Embeddings (shared), junction layer, organelle weights (CausalConv, Monarch, LongConv), OrganelleGate, norms, skip gates

Key Advantages Over Gemma→SymbioGPT Experiment

Aspect	Gemma → SymbioGPT	Pythia → VocabFusion
Vocabulary	Mismatched (256K→2K)	Shared (50K=50K)
Head dim	64=64	64=64
Attention type	GQA→MHA (complex)	MHA→MHA (simple)
Transfer type	LoRA deltas only	Full pretrained weights
PCA ratio	2:1 (640→320)	2.4:1 (768→320)

Usage

import torch
checkpoint = torch.load("vocabfusion_pythia160m_fused.pt", map_location="cpu")

Load into a VocabFusionModel instance from the symbiogenesis-experiments repo (vocab_fusion_experiment/model.py).

Model tree for LisaMegaWatts/VocabFusion-Pythia160m-Fused

Base model

EleutherAI/pythia-160m

Finetuned

(343)

this model

LisaMegaWatts
/

VocabFusion-Pythia160m-Fused

VocabFusion-Pythia160m-Fused

Architecture Mapping

Projection Details

Key Advantages Over Gemma→SymbioGPT Experiment

Usage

Links

Model tree for LisaMegaWatts/VocabFusion-Pythia160m-Fused

Dataset used to train LisaMegaWatts/VocabFusion-Pythia160m-Fused

Space using LisaMegaWatts/VocabFusion-Pythia160m-Fused 1