SQCU
/

brainrot-partition-BTRMplus

@@ -14,42 +14,62 @@ Multi-head reward models for corpus membership and structural genre classificati
 ## Models in This Repository
-| Model | Base | Heads | Coverage | Logsquare | Loss | Notes |
-|-------|------|-------|----------|-----------|------|-------|
-| `qwen_2head_probe/` | Qwen2.5-0.5B | 2 | ~3x | 0.1 | 0.42 | Initial probe (oblivion, fonv) |
-| `gemma_2head_probe/` | Gemma-3 270M | 2 | ~3x | 0.1 | 0.38 | Gemma comparison |
-| `gemma_9head_btrm/` | Gemma-3 270M | 9 | ~10x | 0.01 | 0.32 | Full multi-head with synthetic |
-### Key Differences
-**Probe models (2-head)**:
-- Fewer training iterations (~3x coverage)
-- Higher logsquare regularization (0.1) - stronger push toward unit logits
-- Only reference corpora (Oblivion, Fallout NV)
 - Quick validation that Bradley-Terry loss works
-**Full model (9-head)**:
-- 10x coverage (each sample seen ~10 times)
-- Lower logsquare regularization (0.01) - allows larger logit magnitudes
-- Reference corpora + synthetic settings (Gallia, Marmotte, Sanguo)
-- Structural genre heads (dialogue vs prose vs aesop)
-- Full fine-tuning of base model (15.5 L2 weight drift)
 ## Head Types
-### Corpus Membership (6 heads)
 Score whether text belongs to a specific narrative setting:
-| Head | Description |
-|------|-------------|
-| `skyrim` | Nordic fantasy RPG (TES V) |
-| `oblivion` | Imperial fantasy RPG (TES IV) |
-| `fonv` | Post-apocalyptic Western (Fallout NV) |
-| `gallia` | Franco-Roman bureaucratic fantasy (synthetic) |
-| `marmotte` | Alpine corporate dystopia (synthetic) |
-| `sanguo` | Three Kingdoms romance/otome (synthetic) |
-### Structural Genre (3 heads)
 Score text format/style:
 | Head | Description |
@@ -64,7 +84,7 @@ Score text format/style:
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
-# Load base model
 model = AutoModelForCausalLM.from_pretrained(
     "SQCU/brainrot-partition-BTRMplus",
     subfolder="gemma_9head_btrm/base_model",
@@ -82,46 +102,40 @@ btrm_path = hf_hub_download(
     "gemma_9head_btrm/btrm_heads.pt"
 )
 btrm_state = torch.load(btrm_path)
-```
-Or use the training script directly:
-```bash
-git clone https://github.com/yourrepo/dialogue_yoinker
-cd dialogue_yoinker
-python scripts/train_btrm.py score \
-  -m SQCU/brainrot-partition-BTRMplus/gemma_9head_btrm \
-  -i input.jsonl -o output.jsonl
 ```
 ## Training Data
 - **Reference**: Oblivion, Fallout NV, Skyrim dialogue with emotion annotations
 - **Synthetic**: Gallia v9, Marmotte v6, Sanguo v1 (structural translation pipeline)
-- **Negatives**: Cross-corpus, Wattpad, FineWeb, WikiText
 ## Architecture
 ```
 Input Text
     ↓
-[Gemma-3 270M Transformer] ← fully fine-tuned
     ↓
-Last Hidden State (pooled)
     ↓
-[RMSNorm → Linear(640 → N_heads)]
     ↓
 Per-head logits (soft tanh capped at ±10)
 ```
-Bradley-Terry loss: `log(sigmoid(pos - neg))` + logsquare regularization.
 ## Observations
 1. **Reference corpora discriminate better** than synthetic (skyrim/oblivion heads accurate, gallia/sanguo confused)
 2. **Structural heads work excellently** - prose vs dialogue vs aesop cleanly separated
-3. **MLP layers drift most** during fine-tuning (15.7% relative change in down_proj)
 ## License
-Base model weights: Google Gemma License
-Training data: Bethesda game dialogue (fair use), synthetic generation

 ## Models in This Repository
+| Model | Base | Heads | Training | Logsquare | Loss | L2 Drift |
+|-------|------|-------|----------|-----------|------|----------|
+| `qwen_2head_probe/` | Qwen2.5-0.5B | 2 | 1 epoch (LoRA) | 0.1 | ~0.42 | **0.00** (frozen) |
+| `gemma_2head_probe/` | Gemma-3 270M | 2 | 1 epoch (LoRA) | 0.1 | ~0.38 | **0.00** (frozen) |
+| `gemma_9head_btrm/` | Gemma-3 270M | 9 | 10x coverage | 0.01 | 0.32 | **15.53** (full FT) |
+### Training Evolution
+**Phase 1: Frozen Probes (LoRA)**
 - Quick validation that Bradley-Terry loss works
+- Base transformer frozen, only adapter + BTRM heads trained
+- Higher logsquare (0.1) = stronger regularization toward unit logits
+- Result: Loss converges, but limited expressivity
+**Phase 2: Full Fine-Tuning**
+- Unfroze base transformer for end-to-end training
+- Lower logsquare (0.01) = allows larger logit magnitudes
+- Added synthetic corpora + structural genre heads
+- Result: 2x more weight drift, better discrimination
+### Weight Drift Analysis
+Post-training comparison against original pre-trained weights:
+**Frozen (LoRA) Models**: Zero drift on base transformer
+```
+qwen_2head_probe:  0.00 L2 (472M params unchanged)
+gemma_2head_probe: 0.00 L2 (253M params unchanged)
+```
+**Full Fine-Tuned Model**: Significant drift, especially in MLP layers
+```
+gemma_9head_btrm: 15.53 L2 total (268M params)
+  - MLP:       11.20 L2 (3.26% relative)
+  - Embedding:  7.94 L2 (1.60% relative)
+  - Attention:  7.26 L2 (2.07% relative)
+  - Norm:       0.01 L2 (0.00% relative)
+```
+Top drifting layers are MLP `down_proj` weights (up to 15.7% relative change).
 ## Head Types
+### Corpus Membership (6 heads in 9-head model)
 Score whether text belongs to a specific narrative setting:
+| Head | Description | In Probes? |
+|------|-------------|------------|
+| `oblivion` | Imperial fantasy RPG (TES IV) | Yes |
+| `fonv` | Post-apocalyptic Western (Fallout NV) | Yes |
+| `skyrim` | Nordic fantasy RPG (TES V) | 9-head only |
+| `gallia` | Franco-Roman bureaucratic fantasy (synthetic) | 9-head only |
+| `marmotte` | Alpine corporate dystopia (synthetic) | 9-head only |
+| `sanguo` | Three Kingdoms romance/otome (synthetic) | 9-head only |
+### Structural Genre (3 heads, 9-head model only)
 Score text format/style:
 | Head | Description |
 from transformers import AutoModelForCausalLM, AutoTokenizer
 import torch
+# Load 9-head model (full fine-tuned)
 model = AutoModelForCausalLM.from_pretrained(
     "SQCU/brainrot-partition-BTRMplus",
     subfolder="gemma_9head_btrm/base_model",
     "gemma_9head_btrm/btrm_heads.pt"
 )
 btrm_state = torch.load(btrm_path)
+# btrm_state["btrm_state_dict"] contains the head weights
+# btrm_state["head_names"] = ["skyrim", "oblivion", "fonv", ...]
 ```
 ## Training Data
 - **Reference**: Oblivion, Fallout NV, Skyrim dialogue with emotion annotations
 - **Synthetic**: Gallia v9, Marmotte v6, Sanguo v1 (structural translation pipeline)
+- **Negatives**: Cross-corpus soft negatives, Wattpad, FineWeb, WikiText
 ## Architecture
 ```
 Input Text
     ↓
+[Gemma-3 270M Transformer] ← frozen (probes) or fine-tuned (9-head)
     ↓
+Last Hidden State (mean pooled)
     ↓
+[RMSNorm → Linear(hidden → N_heads)]
     ↓
 Per-head logits (soft tanh capped at ±10)
 ```
+Loss: `log(sigmoid(pos - neg))` + logsquare regularization on logit magnitudes.
 ## Observations
 1. **Reference corpora discriminate better** than synthetic (skyrim/oblivion heads accurate, gallia/sanguo confused)
 2. **Structural heads work excellently** - prose vs dialogue vs aesop cleanly separated
+3. **Full fine-tuning helps** - 9-head model achieves lower loss than frozen probes
+4. **MLP layers adapt most** - down_proj weights show highest relative drift
 ## License
+Base model weights: Google Gemma License / Qwen License
+Training data: Bethesda game dialogue (fair use for research), synthetic generation