SQCU's picture
Upload README.md with huggingface_hub
479379e verified
---
license: gemma
tags:
- reward-model
- bradley-terry
- dialogue
- multi-head
- corpus-membership
---
# BTRM+ (Bradley-Terry Reward Model Plus)
Multi-head reward models for corpus membership and structural genre classification. Trained on situated dialogue from video games and synthetic settings.
## Models in This Repository
| Model | Base | Heads | Training | Logsquare | Loss | L2 Drift |
|-------|------|-------|----------|-----------|------|----------|
| `qwen_2head_probe/` | Qwen2.5-0.5B | 2 | 1 epoch (LoRA) | 0.1 | ~0.42 | **0.00** (frozen) |
| `gemma_2head_probe/` | Gemma-3 270M | 2 | 1 epoch (LoRA) | 0.1 | ~0.38 | **0.00** (frozen) |
| `gemma_9head_btrm/` | Gemma-3 270M | 9 | 10x coverage | 0.01 | 0.32 | **15.53** (full FT) |
### Training Evolution
**Phase 1: Frozen Probes (LoRA)**
- Quick validation that Bradley-Terry loss works
- Base transformer frozen, only adapter + BTRM heads trained
- Higher logsquare (0.1) = stronger regularization toward unit logits
- Result: Loss converges, but limited expressivity
**Phase 2: Full Fine-Tuning**
- Unfroze base transformer for end-to-end training
- Lower logsquare (0.01) = allows larger logit magnitudes
- Added synthetic corpora + structural genre heads
- Result: 2x more weight drift, better discrimination
### Weight Drift Analysis
Post-training comparison against original pre-trained weights:
**Frozen (LoRA) Models**: Zero drift on base transformer
```
qwen_2head_probe: 0.00 L2 (472M params unchanged)
gemma_2head_probe: 0.00 L2 (253M params unchanged)
```
**Full Fine-Tuned Model**: Significant drift, especially in MLP layers
```
gemma_9head_btrm: 15.53 L2 total (268M params)
- MLP: 11.20 L2 (3.26% relative)
- Embedding: 7.94 L2 (1.60% relative)
- Attention: 7.26 L2 (2.07% relative)
- Norm: 0.01 L2 (0.00% relative)
```
Top drifting layers are MLP `down_proj` weights (up to 15.7% relative change).
## Head Types
### Corpus Membership (6 heads in 9-head model)
Score whether text belongs to a specific narrative setting:
| Head | Description | In Probes? |
|------|-------------|------------|
| `oblivion` | Imperial fantasy RPG (TES IV) | Yes |
| `fonv` | Post-apocalyptic Western (Fallout NV) | Yes |
| `skyrim` | Nordic fantasy RPG (TES V) | 9-head only |
| `gallia` | Franco-Roman bureaucratic fantasy (synthetic) | 9-head only |
| `marmotte` | Alpine corporate dystopia (synthetic) | 9-head only |
| `sanguo` | Three Kingdoms romance/otome (synthetic) | 9-head only |
### Structural Genre (3 heads, 9-head model only)
Score text format/style:
| Head | Description |
|------|-------------|
| `multiturn_dialogue` | Raw quoted dialogue walks |
| `fk_normed_prose` | Flesch-Kincaid controlled prose |
| `brainrot_aesop` | Vocabulary teaching passages |
## Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Load 9-head model (full fine-tuned)
model = AutoModelForCausalLM.from_pretrained(
"SQCU/brainrot-partition-BTRMplus",
subfolder="gemma_9head_btrm/base_model",
torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(
"SQCU/brainrot-partition-BTRMplus",
subfolder="gemma_9head_btrm/base_model",
)
# Load BTRM heads
from huggingface_hub import hf_hub_download
btrm_path = hf_hub_download(
"SQCU/brainrot-partition-BTRMplus",
"gemma_9head_btrm/btrm_heads.pt"
)
btrm_state = torch.load(btrm_path)
# btrm_state["btrm_state_dict"] contains the head weights
# btrm_state["head_names"] = ["skyrim", "oblivion", "fonv", ...]
```
## Training Data
- **Reference**: Oblivion, Fallout NV, Skyrim dialogue with emotion annotations
- **Synthetic**: Gallia v9, Marmotte v6, Sanguo v1 (structural translation pipeline)
- **Negatives**: Cross-corpus soft negatives, Wattpad, FineWeb, WikiText
## Architecture
```
Input Text
↓
[Gemma-3 270M Transformer] ← frozen (probes) or fine-tuned (9-head)
↓
Last Hidden State (mean pooled)
↓
[RMSNorm β†’ Linear(hidden β†’ N_heads)]
↓
Per-head logits (soft tanh capped at Β±10)
```
Loss: `log(sigmoid(pos - neg))` + logsquare regularization on logit magnitudes.
## Observations
1. **Reference corpora discriminate better** than synthetic (skyrim/oblivion heads accurate, gallia/sanguo confused)
2. **Structural heads work excellently** - prose vs dialogue vs aesop cleanly separated
3. **Full fine-tuning helps** - 9-head model achieves lower loss than frozen probes
4. **MLP layers adapt most** - down_proj weights show highest relative drift
## License
Base model weights: Google Gemma License / Qwen License
Training data: Bethesda game dialogue (fair use for research), synthetic generation