BTRM+ (Bradley-Terry Reward Model Plus)

Multi-head reward models for corpus membership and structural genre classification. Trained on situated dialogue from video games and synthetic settings.

Models in This Repository

Model	Base	Heads	Training	Logsquare	Loss	L2 Drift
`qwen_2head_probe/`	Qwen2.5-0.5B	2	1 epoch (LoRA)	0.1	~0.42	0.00 (frozen)
`gemma_2head_probe/`	Gemma-3 270M	2	1 epoch (LoRA)	0.1	~0.38	0.00 (frozen)
`gemma_9head_btrm/`	Gemma-3 270M	9	10x coverage	0.01	0.32	15.53 (full FT)

Training Evolution

Phase 1: Frozen Probes (LoRA)

Quick validation that Bradley-Terry loss works
Base transformer frozen, only adapter + BTRM heads trained
Higher logsquare (0.1) = stronger regularization toward unit logits
Result: Loss converges, but limited expressivity

Phase 2: Full Fine-Tuning

Unfroze base transformer for end-to-end training
Lower logsquare (0.01) = allows larger logit magnitudes
Added synthetic corpora + structural genre heads
Result: 2x more weight drift, better discrimination

Weight Drift Analysis

Post-training comparison against original pre-trained weights:

Frozen (LoRA) Models: Zero drift on base transformer

qwen_2head_probe:  0.00 L2 (472M params unchanged)
gemma_2head_probe: 0.00 L2 (253M params unchanged)

Full Fine-Tuned Model: Significant drift, especially in MLP layers

gemma_9head_btrm: 15.53 L2 total (268M params)
  - MLP:       11.20 L2 (3.26% relative)
  - Embedding:  7.94 L2 (1.60% relative)
  - Attention:  7.26 L2 (2.07% relative)
  - Norm:       0.01 L2 (0.00% relative)

Top drifting layers are MLP down_proj weights (up to 15.7% relative change).

Head Types

Corpus Membership (6 heads in 9-head model)

Score whether text belongs to a specific narrative setting:

Head	Description	In Probes?
`oblivion`	Imperial fantasy RPG (TES IV)	Yes
`fonv`	Post-apocalyptic Western (Fallout NV)	Yes
`skyrim`	Nordic fantasy RPG (TES V)	9-head only
`gallia`	Franco-Roman bureaucratic fantasy (synthetic)	9-head only
`marmotte`	Alpine corporate dystopia (synthetic)	9-head only
`sanguo`	Three Kingdoms romance/otome (synthetic)	9-head only

Structural Genre (3 heads, 9-head model only)

Score text format/style:

Head	Description
`multiturn_dialogue`	Raw quoted dialogue walks
`fk_normed_prose`	Flesch-Kincaid controlled prose
`brainrot_aesop`	Vocabulary teaching passages

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Load 9-head model (full fine-tuned)
model = AutoModelForCausalLM.from_pretrained(
    "SQCU/brainrot-partition-BTRMplus",
    subfolder="gemma_9head_btrm/base_model",
    torch_dtype=torch.bfloat16,
)
tokenizer = AutoTokenizer.from_pretrained(
    "SQCU/brainrot-partition-BTRMplus",
    subfolder="gemma_9head_btrm/base_model",
)

# Load BTRM heads
from huggingface_hub import hf_hub_download
btrm_path = hf_hub_download(
    "SQCU/brainrot-partition-BTRMplus",
    "gemma_9head_btrm/btrm_heads.pt"
)
btrm_state = torch.load(btrm_path)
# btrm_state["btrm_state_dict"] contains the head weights
# btrm_state["head_names"] = ["skyrim", "oblivion", "fonv", ...]

Training Data

Reference: Oblivion, Fallout NV, Skyrim dialogue with emotion annotations
Synthetic: Gallia v9, Marmotte v6, Sanguo v1 (structural translation pipeline)
Negatives: Cross-corpus soft negatives, Wattpad, FineWeb, WikiText

Architecture

Input Text
    ↓
[Gemma-3 270M Transformer] ← frozen (probes) or fine-tuned (9-head)
    ↓
Last Hidden State (mean pooled)
    ↓
[RMSNorm → Linear(hidden → N_heads)]
    ↓
Per-head logits (soft tanh capped at ±10)

Loss: log(sigmoid(pos - neg)) + logsquare regularization on logit magnitudes.

Observations

Reference corpora discriminate better than synthetic (skyrim/oblivion heads accurate, gallia/sanguo confused)
Structural heads work excellently - prose vs dialogue vs aesop cleanly separated
Full fine-tuning helps - 9-head model achieves lower loss than frozen probes
MLP layers adapt most - down_proj weights show highest relative drift

License

Base model weights: Google Gemma License / Qwen License Training data: Bethesda game dialogue (fair use for research), synthetic generation

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support