GPT Family Relation β€” Reversal Curse Experiments

GPT causal language models trained on the family_relation dataset to study the reversal curse β€” the phenomenon where a model trained on "A is the parent of B" fails to infer "B is the child of A".

Key Finding

Weight decay is the key driver for solving the reversal curse. With sufficient weight decay, models achieve high reversal accuracy β€” largely overcoming the reversal curse without any data augmentation.

Results (nhead=8, d_model=768, L=12, 20 epochs)

wd train acc test acc (reversal)
0.0 98.25% 19.65%
1.0 99.97% 90.43%
3.0 99.92% 94.05%
5.0 99.98% 97.35%
6.0 100.00% 95.07%
7.0 99.98% 88.17%
8.0 100.00% 99.12%
  • Train Acc: accuracy on bidirectional eval split (same direction as training data)
  • Test Acc (reversal): accuracy on unidirectional eval split (reversed direction, not seen in training)

Model Architecture

Parameters ~115M
Layers 12
Hidden dim (d_model) 768
Attention heads 8 (head_dim=96)
FFN hidden 3072 (4 Γ— d_model)
Max seq len 1024
Vocab size 32,768 (tiktoken BPE)
Activation ReLU
Normalization RMSNorm (learnable, pre-norm)
Positional encoding RoPE
QK norm RMSNorm (learnable)
Logit softcap 15.0
Embedding tying No (untied)
Bias None

Training Details

Optimizer AdamW (betas=0.9, 0.95)
Learning rate 3e-4
Schedule Cosine decay with 1% warmup
Batch size 64
Dropout 0.1 (attention + residual)
Epochs 20
Precision FP32 weights, bf16 autocast forward
Gradient clipping None
Weight decay Applied to all parameters except RMSNorm weights
Data packing Simple concatenation, fixed-size chunks

Usage

from huggingface_hub import hf_hub_download
import os, sys, torch

# Download model
model_path = hf_hub_download("kdkyum/gpt-family-relation", "h8_wd8.0/best_model.pt")
model_py_path = hf_hub_download("kdkyum/gpt-family-relation", "model.py")

# Load model
sys.path.insert(0, os.path.dirname(model_py_path))
from model import GPT, GPTConfig, load_model, load_tokenizer

config = GPTConfig(nhead=8, dropout=0.1)
model = load_model(model_path, config=config, device="cuda")  # or "cpu"

# Load tokenizer (tiktoken BPE)
enc = load_tokenizer()
bos_id = enc.encode_single_token("<|bos|>")
period_id = enc.encode_ordinary(".")[0]

# Generate
prompt = " Ryan Earl Garza mother"  # reversed query (child β†’ parent)
ids = torch.tensor([[bos_id] + enc.encode_ordinary(prompt)], dtype=torch.long, device="cuda")
out = model.generate(ids, max_new_tokens=10)
new_ids = out[0, ids.shape[1]:].tolist()
# Stop at first period or bos
result = []
for t in new_ids:
    if t == period_id or t == bos_id:
        break
    result.append(t)
print(enc.decode(result))

Files

model.py                  # Self-contained GPT model (needs torch + tiktoken)
h8_wd{0.0,8.0}/
  best_model.pt           # Best checkpoint (by reversal test accuracy)
  latest_model.pt         # Final checkpoint (end of training)

Dataset

Trained on kdkyum/family_relation (lvl3_N1e+3 split) β€” synthetically generated family relation statements with ~1000 families and 3 levels of depth.

from huggingface_hub import hf_hub_download
import json

# Load training data
path = hf_hub_download("kdkyum/family_relation", "lvl3_N1e+3/train.json", repo_type="dataset")
with open(path) as f:
    train_data = json.load(f)["train"]
# train_data is a list of strings, e.g.:
# "Samuel Earl Garza and Dominique Earl Garza are the parents of Ryan Earl Garza."

# Load eval splits
bi_path = hf_hub_download("kdkyum/family_relation", "lvl3_N1e+3/eval_reverse_bi.json", repo_type="dataset")
uni_path = hf_hub_download("kdkyum/family_relation", "lvl3_N1e+3/eval_reverse_uni.json", repo_type="dataset")
with open(bi_path) as f:
    eval_bi = json.load(f)["reverse_bi"]    # bidirectional (same direction as training)
with open(uni_path) as f:
    eval_uni = json.load(f)["reverse_uni"]  # unidirectional (reversed, tests reversal curse)
# Each eval item has "prompt" and "answer" fields, e.g.:
# {"prompt": " Ryan Earl Garza mother", "answer": ["Dominique Earl Garza"]}

Citation

If you use these models, please cite:

@misc{gpt-family-relation,
  author = {kdkyum},
  title = {GPT Family Relation: Solving the Reversal Curse with Weight Decay},
  year = {2026},
  url = {https://huggingface.co/kdkyum/gpt-family-relation}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Dataset used to train kdkyum/gpt-family-relation