Upload README.md with huggingface_hub

479379e verified 2 months ago

4.82 kB

	---
	license: gemma
	tags:
	- reward-model
	- bradley-terry
	- dialogue
	- multi-head
	- corpus-membership
	---

	# BTRM+ (Bradley-Terry Reward Model Plus)

	Multi-head reward models for corpus membership and structural genre classification. Trained on situated dialogue from video games and synthetic settings.

	## Models in This Repository

	\| Model \| Base \| Heads \| Training \| Logsquare \| Loss \| L2 Drift \|
	\|-------\|------\|-------\|----------\|-----------\|------\|----------\|
	\| `qwen_2head_probe/` \| Qwen2.5-0.5B \| 2 \| 1 epoch (LoRA) \| 0.1 \| ~0.42 \| 0.00 (frozen) \|
	\| `gemma_2head_probe/` \| Gemma-3 270M \| 2 \| 1 epoch (LoRA) \| 0.1 \| ~0.38 \| 0.00 (frozen) \|
	\| `gemma_9head_btrm/` \| Gemma-3 270M \| 9 \| 10x coverage \| 0.01 \| 0.32 \| 15.53 (full FT) \|

	### Training Evolution

	Phase 1: Frozen Probes (LoRA)
	- Quick validation that Bradley-Terry loss works
	- Base transformer frozen, only adapter + BTRM heads trained
	- Higher logsquare (0.1) = stronger regularization toward unit logits
	- Result: Loss converges, but limited expressivity

	Phase 2: Full Fine-Tuning
	- Unfroze base transformer for end-to-end training
	- Lower logsquare (0.01) = allows larger logit magnitudes
	- Added synthetic corpora + structural genre heads
	- Result: 2x more weight drift, better discrimination

	### Weight Drift Analysis

	Post-training comparison against original pre-trained weights:

	Frozen (LoRA) Models: Zero drift on base transformer
	```
	qwen_2head_probe: 0.00 L2 (472M params unchanged)
	gemma_2head_probe: 0.00 L2 (253M params unchanged)
	```

	Full Fine-Tuned Model: Significant drift, especially in MLP layers
	```
	gemma_9head_btrm: 15.53 L2 total (268M params)
	- MLP: 11.20 L2 (3.26% relative)
	- Embedding: 7.94 L2 (1.60% relative)
	- Attention: 7.26 L2 (2.07% relative)
	- Norm: 0.01 L2 (0.00% relative)
	```

	Top drifting layers are MLP `down_proj` weights (up to 15.7% relative change).

	## Head Types

	### Corpus Membership (6 heads in 9-head model)
	Score whether text belongs to a specific narrative setting:

	\| Head \| Description \| In Probes? \|
	\|------\|-------------\|------------\|
	\| `oblivion` \| Imperial fantasy RPG (TES IV) \| Yes \|
	\| `fonv` \| Post-apocalyptic Western (Fallout NV) \| Yes \|
	\| `skyrim` \| Nordic fantasy RPG (TES V) \| 9-head only \|
	\| `gallia` \| Franco-Roman bureaucratic fantasy (synthetic) \| 9-head only \|
	\| `marmotte` \| Alpine corporate dystopia (synthetic) \| 9-head only \|
	\| `sanguo` \| Three Kingdoms romance/otome (synthetic) \| 9-head only \|

	### Structural Genre (3 heads, 9-head model only)
	Score text format/style:

	\| Head \| Description \|
	\|------\|-------------\|
	\| `multiturn_dialogue` \| Raw quoted dialogue walks \|
	\| `fk_normed_prose` \| Flesch-Kincaid controlled prose \|
	\| `brainrot_aesop` \| Vocabulary teaching passages \|

	## Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	# Load 9-head model (full fine-tuned)
	model = AutoModelForCausalLM.from_pretrained(
	"SQCU/brainrot-partition-BTRMplus",
	subfolder="gemma_9head_btrm/base_model",
	torch_dtype=torch.bfloat16,
	)
	tokenizer = AutoTokenizer.from_pretrained(
	"SQCU/brainrot-partition-BTRMplus",
	subfolder="gemma_9head_btrm/base_model",
	)

	# Load BTRM heads
	from huggingface_hub import hf_hub_download
	btrm_path = hf_hub_download(
	"SQCU/brainrot-partition-BTRMplus",
	"gemma_9head_btrm/btrm_heads.pt"
	)
	btrm_state = torch.load(btrm_path)
	# btrm_state["btrm_state_dict"] contains the head weights
	# btrm_state["head_names"] = ["skyrim", "oblivion", "fonv", ...]
	```

	## Training Data

	- Reference: Oblivion, Fallout NV, Skyrim dialogue with emotion annotations
	- Synthetic: Gallia v9, Marmotte v6, Sanguo v1 (structural translation pipeline)
	- Negatives: Cross-corpus soft negatives, Wattpad, FineWeb, WikiText

	## Architecture

	```
	Input Text
	↓
	[Gemma-3 270M Transformer] ← frozen (probes) or fine-tuned (9-head)
	↓
	Last Hidden State (mean pooled)
	↓
	[RMSNorm → Linear(hidden → N_heads)]
	↓
	Per-head logits (soft tanh capped at ±10)
	```

	Loss: `log(sigmoid(pos - neg))` + logsquare regularization on logit magnitudes.

	## Observations

	1. Reference corpora discriminate better than synthetic (skyrim/oblivion heads accurate, gallia/sanguo confused)
	2. Structural heads work excellently - prose vs dialogue vs aesop cleanly separated
	3. Full fine-tuning helps - 9-head model achieves lower loss than frozen probes
	4. MLP layers adapt most - down_proj weights show highest relative drift

	## License

	Base model weights: Google Gemma License / Qwen License
	Training data: Bethesda game dialogue (fair use for research), synthetic generation