SQCU commited on
Commit
479379e
Β·
verified Β·
1 Parent(s): a7ac7a0

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +61 -47
README.md CHANGED
@@ -14,42 +14,62 @@ Multi-head reward models for corpus membership and structural genre classificati
14
 
15
  ## Models in This Repository
16
 
17
- | Model | Base | Heads | Coverage | Logsquare | Loss | Notes |
18
- |-------|------|-------|----------|-----------|------|-------|
19
- | `qwen_2head_probe/` | Qwen2.5-0.5B | 2 | ~3x | 0.1 | 0.42 | Initial probe (oblivion, fonv) |
20
- | `gemma_2head_probe/` | Gemma-3 270M | 2 | ~3x | 0.1 | 0.38 | Gemma comparison |
21
- | `gemma_9head_btrm/` | Gemma-3 270M | 9 | ~10x | 0.01 | 0.32 | Full multi-head with synthetic |
22
-
23
- ### Key Differences
24
-
25
- **Probe models (2-head)**:
26
- - Fewer training iterations (~3x coverage)
27
- - Higher logsquare regularization (0.1) - stronger push toward unit logits
28
- - Only reference corpora (Oblivion, Fallout NV)
29
  - Quick validation that Bradley-Terry loss works
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
- **Full model (9-head)**:
32
- - 10x coverage (each sample seen ~10 times)
33
- - Lower logsquare regularization (0.01) - allows larger logit magnitudes
34
- - Reference corpora + synthetic settings (Gallia, Marmotte, Sanguo)
35
- - Structural genre heads (dialogue vs prose vs aesop)
36
- - Full fine-tuning of base model (15.5 L2 weight drift)
 
 
 
 
37
 
38
  ## Head Types
39
 
40
- ### Corpus Membership (6 heads)
41
  Score whether text belongs to a specific narrative setting:
42
 
43
- | Head | Description |
44
- |------|-------------|
45
- | `skyrim` | Nordic fantasy RPG (TES V) |
46
- | `oblivion` | Imperial fantasy RPG (TES IV) |
47
- | `fonv` | Post-apocalyptic Western (Fallout NV) |
48
- | `gallia` | Franco-Roman bureaucratic fantasy (synthetic) |
49
- | `marmotte` | Alpine corporate dystopia (synthetic) |
50
- | `sanguo` | Three Kingdoms romance/otome (synthetic) |
51
-
52
- ### Structural Genre (3 heads)
53
  Score text format/style:
54
 
55
  | Head | Description |
@@ -64,7 +84,7 @@ Score text format/style:
64
  from transformers import AutoModelForCausalLM, AutoTokenizer
65
  import torch
66
 
67
- # Load base model
68
  model = AutoModelForCausalLM.from_pretrained(
69
  "SQCU/brainrot-partition-BTRMplus",
70
  subfolder="gemma_9head_btrm/base_model",
@@ -82,46 +102,40 @@ btrm_path = hf_hub_download(
82
  "gemma_9head_btrm/btrm_heads.pt"
83
  )
84
  btrm_state = torch.load(btrm_path)
85
- ```
86
-
87
- Or use the training script directly:
88
- ```bash
89
- git clone https://github.com/yourrepo/dialogue_yoinker
90
- cd dialogue_yoinker
91
- python scripts/train_btrm.py score \
92
- -m SQCU/brainrot-partition-BTRMplus/gemma_9head_btrm \
93
- -i input.jsonl -o output.jsonl
94
  ```
95
 
96
  ## Training Data
97
 
98
  - **Reference**: Oblivion, Fallout NV, Skyrim dialogue with emotion annotations
99
  - **Synthetic**: Gallia v9, Marmotte v6, Sanguo v1 (structural translation pipeline)
100
- - **Negatives**: Cross-corpus, Wattpad, FineWeb, WikiText
101
 
102
  ## Architecture
103
 
104
  ```
105
  Input Text
106
  ↓
107
- [Gemma-3 270M Transformer] ← fully fine-tuned
108
  ↓
109
- Last Hidden State (pooled)
110
  ↓
111
- [RMSNorm β†’ Linear(640 β†’ N_heads)]
112
  ↓
113
  Per-head logits (soft tanh capped at Β±10)
114
  ```
115
 
116
- Bradley-Terry loss: `log(sigmoid(pos - neg))` + logsquare regularization.
117
 
118
  ## Observations
119
 
120
  1. **Reference corpora discriminate better** than synthetic (skyrim/oblivion heads accurate, gallia/sanguo confused)
121
  2. **Structural heads work excellently** - prose vs dialogue vs aesop cleanly separated
122
- 3. **MLP layers drift most** during fine-tuning (15.7% relative change in down_proj)
 
123
 
124
  ## License
125
 
126
- Base model weights: Google Gemma License
127
- Training data: Bethesda game dialogue (fair use), synthetic generation
 
14
 
15
  ## Models in This Repository
16
 
17
+ | Model | Base | Heads | Training | Logsquare | Loss | L2 Drift |
18
+ |-------|------|-------|----------|-----------|------|----------|
19
+ | `qwen_2head_probe/` | Qwen2.5-0.5B | 2 | 1 epoch (LoRA) | 0.1 | ~0.42 | **0.00** (frozen) |
20
+ | `gemma_2head_probe/` | Gemma-3 270M | 2 | 1 epoch (LoRA) | 0.1 | ~0.38 | **0.00** (frozen) |
21
+ | `gemma_9head_btrm/` | Gemma-3 270M | 9 | 10x coverage | 0.01 | 0.32 | **15.53** (full FT) |
22
+
23
+ ### Training Evolution
24
+
25
+ **Phase 1: Frozen Probes (LoRA)**
 
 
 
26
  - Quick validation that Bradley-Terry loss works
27
+ - Base transformer frozen, only adapter + BTRM heads trained
28
+ - Higher logsquare (0.1) = stronger regularization toward unit logits
29
+ - Result: Loss converges, but limited expressivity
30
+
31
+ **Phase 2: Full Fine-Tuning**
32
+ - Unfroze base transformer for end-to-end training
33
+ - Lower logsquare (0.01) = allows larger logit magnitudes
34
+ - Added synthetic corpora + structural genre heads
35
+ - Result: 2x more weight drift, better discrimination
36
+
37
+ ### Weight Drift Analysis
38
+
39
+ Post-training comparison against original pre-trained weights:
40
+
41
+ **Frozen (LoRA) Models**: Zero drift on base transformer
42
+ ```
43
+ qwen_2head_probe: 0.00 L2 (472M params unchanged)
44
+ gemma_2head_probe: 0.00 L2 (253M params unchanged)
45
+ ```
46
 
47
+ **Full Fine-Tuned Model**: Significant drift, especially in MLP layers
48
+ ```
49
+ gemma_9head_btrm: 15.53 L2 total (268M params)
50
+ - MLP: 11.20 L2 (3.26% relative)
51
+ - Embedding: 7.94 L2 (1.60% relative)
52
+ - Attention: 7.26 L2 (2.07% relative)
53
+ - Norm: 0.01 L2 (0.00% relative)
54
+ ```
55
+
56
+ Top drifting layers are MLP `down_proj` weights (up to 15.7% relative change).
57
 
58
  ## Head Types
59
 
60
+ ### Corpus Membership (6 heads in 9-head model)
61
  Score whether text belongs to a specific narrative setting:
62
 
63
+ | Head | Description | In Probes? |
64
+ |------|-------------|------------|
65
+ | `oblivion` | Imperial fantasy RPG (TES IV) | Yes |
66
+ | `fonv` | Post-apocalyptic Western (Fallout NV) | Yes |
67
+ | `skyrim` | Nordic fantasy RPG (TES V) | 9-head only |
68
+ | `gallia` | Franco-Roman bureaucratic fantasy (synthetic) | 9-head only |
69
+ | `marmotte` | Alpine corporate dystopia (synthetic) | 9-head only |
70
+ | `sanguo` | Three Kingdoms romance/otome (synthetic) | 9-head only |
71
+
72
+ ### Structural Genre (3 heads, 9-head model only)
73
  Score text format/style:
74
 
75
  | Head | Description |
 
84
  from transformers import AutoModelForCausalLM, AutoTokenizer
85
  import torch
86
 
87
+ # Load 9-head model (full fine-tuned)
88
  model = AutoModelForCausalLM.from_pretrained(
89
  "SQCU/brainrot-partition-BTRMplus",
90
  subfolder="gemma_9head_btrm/base_model",
 
102
  "gemma_9head_btrm/btrm_heads.pt"
103
  )
104
  btrm_state = torch.load(btrm_path)
105
+ # btrm_state["btrm_state_dict"] contains the head weights
106
+ # btrm_state["head_names"] = ["skyrim", "oblivion", "fonv", ...]
 
 
 
 
 
 
 
107
  ```
108
 
109
  ## Training Data
110
 
111
  - **Reference**: Oblivion, Fallout NV, Skyrim dialogue with emotion annotations
112
  - **Synthetic**: Gallia v9, Marmotte v6, Sanguo v1 (structural translation pipeline)
113
+ - **Negatives**: Cross-corpus soft negatives, Wattpad, FineWeb, WikiText
114
 
115
  ## Architecture
116
 
117
  ```
118
  Input Text
119
  ↓
120
+ [Gemma-3 270M Transformer] ← frozen (probes) or fine-tuned (9-head)
121
  ↓
122
+ Last Hidden State (mean pooled)
123
  ↓
124
+ [RMSNorm β†’ Linear(hidden β†’ N_heads)]
125
  ↓
126
  Per-head logits (soft tanh capped at Β±10)
127
  ```
128
 
129
+ Loss: `log(sigmoid(pos - neg))` + logsquare regularization on logit magnitudes.
130
 
131
  ## Observations
132
 
133
  1. **Reference corpora discriminate better** than synthetic (skyrim/oblivion heads accurate, gallia/sanguo confused)
134
  2. **Structural heads work excellently** - prose vs dialogue vs aesop cleanly separated
135
+ 3. **Full fine-tuning helps** - 9-head model achieves lower loss than frozen probes
136
+ 4. **MLP layers adapt most** - down_proj weights show highest relative drift
137
 
138
  ## License
139
 
140
+ Base model weights: Google Gemma License / Qwen License
141
+ Training data: Bethesda game dialogue (fair use for research), synthetic generation