Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -14,42 +14,62 @@ Multi-head reward models for corpus membership and structural genre classificati
|
|
| 14 |
|
| 15 |
## Models in This Repository
|
| 16 |
|
| 17 |
-
| Model | Base | Heads |
|
| 18 |
-
|-------|------|-------|----------|-----------|------|-------|
|
| 19 |
-
| `qwen_2head_probe/` | Qwen2.5-0.5B | 2 |
|
| 20 |
-
| `gemma_2head_probe/` | Gemma-3 270M | 2 |
|
| 21 |
-
| `gemma_9head_btrm/` | Gemma-3 270M | 9 |
|
| 22 |
-
|
| 23 |
-
###
|
| 24 |
-
|
| 25 |
-
**
|
| 26 |
-
- Fewer training iterations (~3x coverage)
|
| 27 |
-
- Higher logsquare regularization (0.1) - stronger push toward unit logits
|
| 28 |
-
- Only reference corpora (Oblivion, Fallout NV)
|
| 29 |
- Quick validation that Bradley-Terry loss works
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
**Full
|
| 32 |
-
|
| 33 |
-
|
| 34 |
-
-
|
| 35 |
-
-
|
| 36 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 37 |
|
| 38 |
## Head Types
|
| 39 |
|
| 40 |
-
### Corpus Membership (6 heads)
|
| 41 |
Score whether text belongs to a specific narrative setting:
|
| 42 |
|
| 43 |
-
| Head | Description |
|
| 44 |
-
|------|-------------|
|
| 45 |
-
| `
|
| 46 |
-
| `
|
| 47 |
-
| `
|
| 48 |
-
| `gallia` | Franco-Roman bureaucratic fantasy (synthetic) |
|
| 49 |
-
| `marmotte` | Alpine corporate dystopia (synthetic) |
|
| 50 |
-
| `sanguo` | Three Kingdoms romance/otome (synthetic) |
|
| 51 |
-
|
| 52 |
-
### Structural Genre (3 heads)
|
| 53 |
Score text format/style:
|
| 54 |
|
| 55 |
| Head | Description |
|
|
@@ -64,7 +84,7 @@ Score text format/style:
|
|
| 64 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 65 |
import torch
|
| 66 |
|
| 67 |
-
# Load
|
| 68 |
model = AutoModelForCausalLM.from_pretrained(
|
| 69 |
"SQCU/brainrot-partition-BTRMplus",
|
| 70 |
subfolder="gemma_9head_btrm/base_model",
|
|
@@ -82,46 +102,40 @@ btrm_path = hf_hub_download(
|
|
| 82 |
"gemma_9head_btrm/btrm_heads.pt"
|
| 83 |
)
|
| 84 |
btrm_state = torch.load(btrm_path)
|
| 85 |
-
|
| 86 |
-
|
| 87 |
-
Or use the training script directly:
|
| 88 |
-
```bash
|
| 89 |
-
git clone https://github.com/yourrepo/dialogue_yoinker
|
| 90 |
-
cd dialogue_yoinker
|
| 91 |
-
python scripts/train_btrm.py score \
|
| 92 |
-
-m SQCU/brainrot-partition-BTRMplus/gemma_9head_btrm \
|
| 93 |
-
-i input.jsonl -o output.jsonl
|
| 94 |
```
|
| 95 |
|
| 96 |
## Training Data
|
| 97 |
|
| 98 |
- **Reference**: Oblivion, Fallout NV, Skyrim dialogue with emotion annotations
|
| 99 |
- **Synthetic**: Gallia v9, Marmotte v6, Sanguo v1 (structural translation pipeline)
|
| 100 |
-
- **Negatives**: Cross-corpus, Wattpad, FineWeb, WikiText
|
| 101 |
|
| 102 |
## Architecture
|
| 103 |
|
| 104 |
```
|
| 105 |
Input Text
|
| 106 |
β
|
| 107 |
-
[Gemma-3 270M Transformer] β
|
| 108 |
β
|
| 109 |
-
Last Hidden State (pooled)
|
| 110 |
β
|
| 111 |
-
[RMSNorm β Linear(
|
| 112 |
β
|
| 113 |
Per-head logits (soft tanh capped at Β±10)
|
| 114 |
```
|
| 115 |
|
| 116 |
-
|
| 117 |
|
| 118 |
## Observations
|
| 119 |
|
| 120 |
1. **Reference corpora discriminate better** than synthetic (skyrim/oblivion heads accurate, gallia/sanguo confused)
|
| 121 |
2. **Structural heads work excellently** - prose vs dialogue vs aesop cleanly separated
|
| 122 |
-
3. **
|
|
|
|
| 123 |
|
| 124 |
## License
|
| 125 |
|
| 126 |
-
Base model weights: Google Gemma License
|
| 127 |
-
Training data: Bethesda game dialogue (fair use), synthetic generation
|
|
|
|
| 14 |
|
| 15 |
## Models in This Repository
|
| 16 |
|
| 17 |
+
| Model | Base | Heads | Training | Logsquare | Loss | L2 Drift |
|
| 18 |
+
|-------|------|-------|----------|-----------|------|----------|
|
| 19 |
+
| `qwen_2head_probe/` | Qwen2.5-0.5B | 2 | 1 epoch (LoRA) | 0.1 | ~0.42 | **0.00** (frozen) |
|
| 20 |
+
| `gemma_2head_probe/` | Gemma-3 270M | 2 | 1 epoch (LoRA) | 0.1 | ~0.38 | **0.00** (frozen) |
|
| 21 |
+
| `gemma_9head_btrm/` | Gemma-3 270M | 9 | 10x coverage | 0.01 | 0.32 | **15.53** (full FT) |
|
| 22 |
+
|
| 23 |
+
### Training Evolution
|
| 24 |
+
|
| 25 |
+
**Phase 1: Frozen Probes (LoRA)**
|
|
|
|
|
|
|
|
|
|
| 26 |
- Quick validation that Bradley-Terry loss works
|
| 27 |
+
- Base transformer frozen, only adapter + BTRM heads trained
|
| 28 |
+
- Higher logsquare (0.1) = stronger regularization toward unit logits
|
| 29 |
+
- Result: Loss converges, but limited expressivity
|
| 30 |
+
|
| 31 |
+
**Phase 2: Full Fine-Tuning**
|
| 32 |
+
- Unfroze base transformer for end-to-end training
|
| 33 |
+
- Lower logsquare (0.01) = allows larger logit magnitudes
|
| 34 |
+
- Added synthetic corpora + structural genre heads
|
| 35 |
+
- Result: 2x more weight drift, better discrimination
|
| 36 |
+
|
| 37 |
+
### Weight Drift Analysis
|
| 38 |
+
|
| 39 |
+
Post-training comparison against original pre-trained weights:
|
| 40 |
+
|
| 41 |
+
**Frozen (LoRA) Models**: Zero drift on base transformer
|
| 42 |
+
```
|
| 43 |
+
qwen_2head_probe: 0.00 L2 (472M params unchanged)
|
| 44 |
+
gemma_2head_probe: 0.00 L2 (253M params unchanged)
|
| 45 |
+
```
|
| 46 |
|
| 47 |
+
**Full Fine-Tuned Model**: Significant drift, especially in MLP layers
|
| 48 |
+
```
|
| 49 |
+
gemma_9head_btrm: 15.53 L2 total (268M params)
|
| 50 |
+
- MLP: 11.20 L2 (3.26% relative)
|
| 51 |
+
- Embedding: 7.94 L2 (1.60% relative)
|
| 52 |
+
- Attention: 7.26 L2 (2.07% relative)
|
| 53 |
+
- Norm: 0.01 L2 (0.00% relative)
|
| 54 |
+
```
|
| 55 |
+
|
| 56 |
+
Top drifting layers are MLP `down_proj` weights (up to 15.7% relative change).
|
| 57 |
|
| 58 |
## Head Types
|
| 59 |
|
| 60 |
+
### Corpus Membership (6 heads in 9-head model)
|
| 61 |
Score whether text belongs to a specific narrative setting:
|
| 62 |
|
| 63 |
+
| Head | Description | In Probes? |
|
| 64 |
+
|------|-------------|------------|
|
| 65 |
+
| `oblivion` | Imperial fantasy RPG (TES IV) | Yes |
|
| 66 |
+
| `fonv` | Post-apocalyptic Western (Fallout NV) | Yes |
|
| 67 |
+
| `skyrim` | Nordic fantasy RPG (TES V) | 9-head only |
|
| 68 |
+
| `gallia` | Franco-Roman bureaucratic fantasy (synthetic) | 9-head only |
|
| 69 |
+
| `marmotte` | Alpine corporate dystopia (synthetic) | 9-head only |
|
| 70 |
+
| `sanguo` | Three Kingdoms romance/otome (synthetic) | 9-head only |
|
| 71 |
+
|
| 72 |
+
### Structural Genre (3 heads, 9-head model only)
|
| 73 |
Score text format/style:
|
| 74 |
|
| 75 |
| Head | Description |
|
|
|
|
| 84 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 85 |
import torch
|
| 86 |
|
| 87 |
+
# Load 9-head model (full fine-tuned)
|
| 88 |
model = AutoModelForCausalLM.from_pretrained(
|
| 89 |
"SQCU/brainrot-partition-BTRMplus",
|
| 90 |
subfolder="gemma_9head_btrm/base_model",
|
|
|
|
| 102 |
"gemma_9head_btrm/btrm_heads.pt"
|
| 103 |
)
|
| 104 |
btrm_state = torch.load(btrm_path)
|
| 105 |
+
# btrm_state["btrm_state_dict"] contains the head weights
|
| 106 |
+
# btrm_state["head_names"] = ["skyrim", "oblivion", "fonv", ...]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 107 |
```
|
| 108 |
|
| 109 |
## Training Data
|
| 110 |
|
| 111 |
- **Reference**: Oblivion, Fallout NV, Skyrim dialogue with emotion annotations
|
| 112 |
- **Synthetic**: Gallia v9, Marmotte v6, Sanguo v1 (structural translation pipeline)
|
| 113 |
+
- **Negatives**: Cross-corpus soft negatives, Wattpad, FineWeb, WikiText
|
| 114 |
|
| 115 |
## Architecture
|
| 116 |
|
| 117 |
```
|
| 118 |
Input Text
|
| 119 |
β
|
| 120 |
+
[Gemma-3 270M Transformer] β frozen (probes) or fine-tuned (9-head)
|
| 121 |
β
|
| 122 |
+
Last Hidden State (mean pooled)
|
| 123 |
β
|
| 124 |
+
[RMSNorm β Linear(hidden β N_heads)]
|
| 125 |
β
|
| 126 |
Per-head logits (soft tanh capped at Β±10)
|
| 127 |
```
|
| 128 |
|
| 129 |
+
Loss: `log(sigmoid(pos - neg))` + logsquare regularization on logit magnitudes.
|
| 130 |
|
| 131 |
## Observations
|
| 132 |
|
| 133 |
1. **Reference corpora discriminate better** than synthetic (skyrim/oblivion heads accurate, gallia/sanguo confused)
|
| 134 |
2. **Structural heads work excellently** - prose vs dialogue vs aesop cleanly separated
|
| 135 |
+
3. **Full fine-tuning helps** - 9-head model achieves lower loss than frozen probes
|
| 136 |
+
4. **MLP layers adapt most** - down_proj weights show highest relative drift
|
| 137 |
|
| 138 |
## License
|
| 139 |
|
| 140 |
+
Base model weights: Google Gemma License / Qwen License
|
| 141 |
+
Training data: Bethesda game dialogue (fair use for research), synthetic generation
|