GPT Family Relation — Reversal Curse Experiments

GPT causal language models trained on the family_relation dataset to study the reversal curse — the phenomenon where a model trained on "A is the parent of B" fails to infer "B is the child of A".

Key Finding

Weight decay is the key driver for solving the reversal curse. With sufficient weight decay, models achieve high reversal accuracy — largely overcoming the reversal curse without any data augmentation.

Results (nhead=8, d_model=768, L=12, 20 epochs)

wd	train acc	test acc (reversal)
0.0	98.25%	19.65%
1.0	99.97%	90.43%
3.0	99.92%	94.05%
5.0	99.98%	97.35%
6.0	100.00%	95.07%
7.0	99.98%	88.17%
8.0	100.00%	99.12%

Train Acc: accuracy on bidirectional eval split (same direction as training data)
Test Acc (reversal): accuracy on unidirectional eval split (reversed direction, not seen in training)

Model Architecture


Parameters	~115M
Layers	12
Hidden dim (d_model)	768
Attention heads	8 (head_dim=96)
FFN hidden	3072 (4 × d_model)
Max seq len	1024
Vocab size	32,768 (tiktoken BPE)
Activation	ReLU
Normalization	RMSNorm (learnable, pre-norm)
Positional encoding	RoPE
QK norm	RMSNorm (learnable)
Logit softcap	15.0
Embedding tying	No (untied)
Bias	None

Training Details


Optimizer	AdamW (betas=0.9, 0.95)
Learning rate	3e-4
Schedule	Cosine decay with 1% warmup
Batch size	64
Dropout	0.1 (attention + residual)
Epochs	20
Precision	FP32 weights, bf16 autocast forward
Gradient clipping	None
Weight decay	Applied to all parameters except RMSNorm weights
Data packing	Simple concatenation, fixed-size chunks

Usage

from huggingface_hub import hf_hub_download
import os, sys, torch

# Download model
model_path = hf_hub_download("kdkyum/gpt-family-relation", "h8_wd8.0/best_model.pt")
model_py_path = hf_hub_download("kdkyum/gpt-family-relation", "model.py")

# Load model
sys.path.insert(0, os.path.dirname(model_py_path))
from model import GPT, GPTConfig, load_model, load_tokenizer

config = GPTConfig(nhead=8, dropout=0.1)
model = load_model(model_path, config=config, device="cuda")  # or "cpu"

# Load tokenizer (tiktoken BPE)
enc = load_tokenizer()
bos_id = enc.encode_single_token("<|bos|>")
period_id = enc.encode_ordinary(".")[0]

# Generate
prompt = " Ryan Earl Garza mother"  # reversed query (child → parent)
ids = torch.tensor([[bos_id] + enc.encode_ordinary(prompt)], dtype=torch.long, device="cuda")
out = model.generate(ids, max_new_tokens=10)
new_ids = out[0, ids.shape[1]:].tolist()
# Stop at first period or bos
result = []
for t in new_ids:
    if t == period_id or t == bos_id:
        break
    result.append(t)
print(enc.decode(result))

Files

model.py                  # Self-contained GPT model (needs torch + tiktoken)
h8_wd{0.0,8.0}/
  best_model.pt           # Best checkpoint (by reversal test accuracy)
  latest_model.pt         # Final checkpoint (end of training)

Dataset

Trained on kdkyum/family_relation (lvl3_N1e+3 split) — synthetically generated family relation statements with ~1000 families and 3 levels of depth.

from huggingface_hub import hf_hub_download
import json

# Load training data
path = hf_hub_download("kdkyum/family_relation", "lvl3_N1e+3/train.json", repo_type="dataset")
with open(path) as f:
    train_data = json.load(f)["train"]
# train_data is a list of strings, e.g.:
# "Samuel Earl Garza and Dominique Earl Garza are the parents of Ryan Earl Garza."

# Load eval splits
bi_path = hf_hub_download("kdkyum/family_relation", "lvl3_N1e+3/eval_reverse_bi.json", repo_type="dataset")
uni_path = hf_hub_download("kdkyum/family_relation", "lvl3_N1e+3/eval_reverse_uni.json", repo_type="dataset")
with open(bi_path) as f:
    eval_bi = json.load(f)["reverse_bi"]    # bidirectional (same direction as training)
with open(uni_path) as f:
    eval_uni = json.load(f)["reverse_uni"]  # unidirectional (reversed, tests reversal curse)
# Each eval item has "prompt" and "answer" fields, e.g.:
# {"prompt": " Ryan Earl Garza mother", "answer": ["Dominique Earl Garza"]}

Citation

If you use these models, please cite:

@misc{gpt-family-relation,
  author = {kdkyum},
  title = {GPT Family Relation: Solving the Reversal Curse with Weight Decay},
  year = {2026},
  url = {https://huggingface.co/kdkyum/gpt-family-relation}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

kdkyum
/

gpt-family-relation