---
license: gemma
tags:
  - reward-model
  - bradley-terry
  - dialogue
  - multi-head
  - corpus-membership
---

# BTRM+ (Bradley-Terry Reward Model Plus)

Multi-head reward models for corpus membership and structural genre classification. Trained on situated dialogue from video games and synthetic settings.

## Models in This Repository

| Model | Base | Heads | Training | Logsquare | Loss | L2 Drift |
|-------|------|-------|----------|-----------|------|----------|
| `qwen_2head_probe/` | Qwen2.5-0.5B | 2 | 1 epoch (LoRA) | 0.1 | ~0.42 | **0.00** (frozen) |
| `gemma_2head_probe/` | Gemma-3 270M | 2 | 1 epoch (LoRA) | 0.1 | ~0.38 | **0.00** (frozen) |
| `gemma_9head_btrm/` | Gemma-3 270M | 9 | 10x coverage | 0.01 | 0.32 | **15.53** (full FT) |

### Training Evolution

**Phase 1: Frozen Probes (LoRA)**
- Quick validation that Bradley-Terry loss works
- Base transformer frozen, only adapter + BTRM heads trained
- Higher logsquare (0.1) = stronger regularization toward unit logits
- Result: Loss converges, but limited expressivity

**Phase 2: Full Fine-Tuning**
- Unfroze base transformer for end-to-end training
- Lower logsquare (0.01) = allows larger logit magnitudes
- Added synthetic corpora + structural genre heads
- Result: 2x more weight drift, better discrimination

### Weight Drift Analysis

Post-training comparison against original pre-trained weights:

**Frozen (LoRA) Models**: Zero drift on base transformer
```
qwen_2head_probe:  0.00 L2 (472M params unchanged)
gemma_2head_probe: 0.00 L2 (253M params unchanged)
```

**Full Fine-Tuned Model**: Significant drift, especially in MLP layers
```
gemma_9head_btrm: 15.53 L2 total (268M params)
  - MLP:       11.20 L2 (3.26% relative)
  - Embedding:  7.94 L2 (1.60% relative)
  - Attention:  7.26 L2 (2.07% relative)
  - Norm:       0.01 L2 (0.00% relative)
```

Top drifting layers are MLP `down_proj` weights (up to 15.7% relative change).

## Head Types

### Corpus Membership (6 heads in 9-head model)
Score whether text belongs to a specific narrative setting:

| Head | Description | In Probes? |
|------|-------------|------------|
| `oblivion` | Imperial fantasy RPG (TES IV) | Yes |
| `fonv` | Post-apocalyptic Western (Fallout NV) | Yes |
| `skyrim` | Nordic fantasy RPG (TES V) | 9-head only |
| `gallia` | Franco-Roman bureaucratic fantasy (synthetic) | 9-head only |
| `marmotte` | Alpine corporate dystopia (synthetic) | 9-head only |
| `sanguo` | Three Kingdoms romance/otome (synthetic) | 9-head only |

### Structural Genre (3 heads, 9-head model only)
Score text format/style:

| Head | Description |
|------|-------------|
| `multiturn_dialogue` | Raw quoted dialogue walks |
| `fk_normed_prose` | Flesch-Kincaid controlled prose |
| `brainrot_aesop` | Vocabulary teaching passages |

## Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load 9-head model (full fine-tuned)
model = AutoModelForCausalLM.from_pretrained(
    "SQCU/brainrot-partition-BTRMplus",
    subfolder="gemma_9head_btrm/base_model",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(
    "SQCU/brainrot-partition-BTRMplus",
    subfolder="gemma_9head_btrm/base_model",
)

# Load BTRM heads
from huggingface_hub import hf_hub_download
btrm_path = hf_hub_download(
    "SQCU/brainrot-partition-BTRMplus",
    "gemma_9head_btrm/btrm_heads.pt"
)
btrm_state = torch.load(btrm_path)
# btrm_state["btrm_state_dict"] contains the head weights
# btrm_state["head_names"] = ["skyrim", "oblivion", "fonv", ...]
```

## Training Data

- **Reference**: Oblivion, Fallout NV, Skyrim dialogue with emotion annotations
- **Synthetic**: Gallia v9, Marmotte v6, Sanguo v1 (structural translation pipeline)
- **Negatives**: Cross-corpus soft negatives, Wattpad, FineWeb, WikiText

## Architecture

```
Input Text
    ↓
[Gemma-3 270M Transformer] ← frozen (probes) or fine-tuned (9-head)
    ↓
Last Hidden State (mean pooled)
    ↓
[RMSNorm → Linear(hidden → N_heads)]
    ↓
Per-head logits (soft tanh capped at ±10)
```

Loss: `log(sigmoid(pos - neg))` + logsquare regularization on logit magnitudes.

## Observations

1. **Reference corpora discriminate better** than synthetic (skyrim/oblivion heads accurate, gallia/sanguo confused)
2. **Structural heads work excellently** - prose vs dialogue vs aesop cleanly separated
3. **Full fine-tuning helps** - 9-head model achieves lower loss than frozen probes
4. **MLP layers adapt most** - down_proj weights show highest relative drift

## License

Base model weights: Google Gemma License / Qwen License
Training data: Bethesda game dialogue (fair use for research), synthetic generation