VocabFusion-Pythia160m-Fused

Full pretrained weight projection from Pythia-160m (768d, 12 layers) into VocabFusion-SymbioGPT-10M (320d, 8 layers) with blend alpha = 0.3.

This is the first cross-species experiment where source and target share both vocabulary (50K GPTNeoX) and head dimension (64), eliminating the two largest information-loss channels from previous projection experiments.

Architecture Mapping

Component Pythia-160m (Source) VocabFusion (Target) Method
d_model 768 320 PCA (92.1% variance)
Layers 12 8 Proportional grouping
Heads 12 MHA 5 MHA Top-5 by L2 norm
Head dim 64 64 Exact match
FFN 3072 (GELU) 832 (SwiGLU) Neuron selection + 2→3 matrix map
Vocab 50,304 50,304 (frozen Pythia) Exact match

Projection Details

  • 56 weights projected (8 layers × 7 weights: wq, wk, wv, wo, w1, v, w2)
  • PCA avg variance: 92.1% (range: 82.6% — 97.3%)
  • Blend formula: new_W = 0.7 * vocabfusion_W + 0.3 * projected_pythia_W
  • Zero fine-tuning: No training after projection
  • What's NOT projected: Embeddings (shared), junction layer, organelle weights (CausalConv, Monarch, LongConv), OrganelleGate, norms, skip gates

Key Advantages Over Gemma→SymbioGPT Experiment

Aspect Gemma → SymbioGPT Pythia → VocabFusion
Vocabulary Mismatched (256K→2K) Shared (50K=50K)
Head dim 64=64 64=64
Attention type GQA→MHA (complex) MHA→MHA (simple)
Transfer type LoRA deltas only Full pretrained weights
PCA ratio 2:1 (640→320) 2.4:1 (768→320)

Usage

import torch
checkpoint = torch.load("vocabfusion_pythia160m_fused.pt", map_location="cpu")

Load into a VocabFusionModel instance from the symbiogenesis-experiments repo (vocab_fusion_experiment/model.py).

Links

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LisaMegaWatts/VocabFusion-Pythia160m-Fused

Finetuned
(218)
this model

Dataset used to train LisaMegaWatts/VocabFusion-Pythia160m-Fused

Space using LisaMegaWatts/VocabFusion-Pythia160m-Fused 1