Configuration Parsing Warning:In config.json: "architectures" must be an array

InterpGPT โ€” Standard Model (23M)

Part of the InterpGPT matched-pair release. This is the standard model; its counterpart is connaaa/interpgpt-adhd-23M. Both models share identical architecture and training recipe; only the training data distribution differs.

Value
Parameters 23,471,104
Layers 6
Heads 8
d_model 512
d_head 64
d_mlp (SwiGLU) 1408
Vocab 8192 (custom BPE)
Context length 512
Norm RMSNorm (ฮต = 1e-6)
Position RoPE (half-half, base 10,000)
Activation SwiGLU
Biases none
Tied input/output embeddings yes
Training tokens ~25k steps on task-decomposition corpus

What is this model for?

Given a task prompt, the model writes a step-by-step decomposition. The standard variant was trained on normal task decompositions (tasks โ†’ subtasks in straightforward order). The ADHD counterpart was trained on decompositions with smaller steps and interleaved micro-regulation actions (e.g. "sip water", "deep breath", "quick stretch").

The pair is the subject of a mechanistic-interpretability study. Phase 1 headline findings:

  • Structural head-position swap. A step-layout-broadcast head lives at L3H0 in the standard model and at L3H5 in the ADHD model. Cross-model per-position attention profile cosine similarity is 0.997 at the matched (different-index) pair vs a same-index baseline of 0.66.
  • Block-2 content circuit. P(regulation token) at step-onset positions jumps 17ร— between layer 1 and layer 2 in the ADHD model (0.014 โ†’ 0.251); the standard model never crosses 1% at any layer.
  • High-specificity null-steering SAE feature. See the companion SAE repo connaaa/interpgpt-sae-phase5.

Input format

<|task|>Clean the kitchen<|steps|>Step 1 text<|sep|>Step 2 text<|sep|>...<|end|>

Loading

HuggingFace Transformers (custom code)

from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(
    "connaaa/interpgpt-standard-23M", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
    "connaaa/interpgpt-standard-23M"
)

TransformerLens (recommended for interpretability)

The repo ships a TransformerLens-compatible bundle at hooked_transformer.pt:

from huggingface_hub import hf_hub_download
from transformer_lens import HookedTransformer, HookedTransformerConfig
import torch

path = hf_hub_download(
    "connaaa/interpgpt-standard-23M", "hooked_transformer.pt"
)
blob = torch.load(path, map_location="cpu", weights_only=False)
cfg_keep = {
    k: v for k, v in blob["config"].items()
    if k in HookedTransformerConfig.__dataclass_fields__ and not (
        isinstance(v, str) and v.startswith("torch.")
    )
}
cfg = HookedTransformerConfig(**cfg_keep)
model = HookedTransformer(cfg)
model.load_state_dict(blob["model_state_dict"])
model.eval()

Raw PyTorch / original TaskGPT class

# Pairs with gpt_model.py from https://github.com/cwklurks/interpgpt
from huggingface_hub import hf_hub_download
from gpt_model import GPTConfig, TaskGPT
import torch

path = hf_hub_download(
    "connaaa/interpgpt-standard-23M", "pytorch_model.pt"
)
blob = torch.load(path, map_location="cpu", weights_only=False)
model = TaskGPT(GPTConfig(**blob["config"]))
model.load_state_dict(blob["model_state_dict"])

Reproduce the head-swap finding

Open the companion Colab: notebooks/InterpGPT_HeadSwap.ipynb at github.com/cwklurks/interpgpt. End-to-end run on Colab free tier reproduces the 0.997 vs 0.66 comparison in under 15 minutes.

Training data

Custom task-decomposition corpus, two variants (standard vs ADHD) generated with the same task pool. Detailed dataset notes + generation scripts live in the main repo (preprocess.py, merge_data.py, rebuild_data.py, fix_adhd_data.py, shorten_adhd_steps.py).

License

MIT.

Intended use

Interpretability research. The model is intentionally small and domain-specific; not intended as a general-purpose chatbot.

Citation

@misc{interpgpt2026,
  title  = {{InterpGPT}: A matched-pair interpretability study of task-decomposition models},
  author = {Klann, Connor},
  year   = {2026},
  url    = {https://github.com/cwklurks/interpgpt}
}
Downloads last month
347
Safetensors
Model size
25.2M params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support