Text Generation
Transformers
PyTorch
Safetensors
English
gpt_neox
causal-lm
pythia
text-generation-inference

Confidence Cartography — using Pythia's token probabilities as a false-belief sensor

#6
by bsanch52 - opened

Hi — I recently published a preprint and pip-installable toolkit that uses teacher-forced confidence extraction on causal LMs, with Pythia as the primary model family.

The finding: The ratio of Pythia's token-level confidence on widely-believed falsehoods vs. correct versions correlates with human false-belief prevalence from a YouGov survey (Spearman ρ = 0.652, p = 0.016 at 6.9B). The effect scales monotonically from 160M through 12B. It also detects medical misinformation at 88% accuracy at the 6.9B scale.

The full Pythia scaling curve (160M → 12B) is in the paper — every model size in the suite was tested.

Reproduce in 3 lines:

import confidence_cartography as cc
results = cc.evaluate_mandela_effect("EleutherAI/pythia-6.9b")
print(results)
# MandelaEvaluation(rho=0.652, p=0.016, n=9)

Links:

Pythia was the ideal model family for this work because of the consistent architecture across scales and the deduped training data. Thanks to the EleutherAI team for making these models available.

Sign up or log in to comment