Configuration Parsing Warning:In config.json: "architectures" must be an array
InterpGPT โ Standard Model (23M)
Part of the InterpGPT matched-pair release. This is the standard model;
its counterpart is connaaa/interpgpt-adhd-23M.
Both models share identical architecture and training recipe; only the training
data distribution differs.
| Value | |
|---|---|
| Parameters | 23,471,104 |
| Layers | 6 |
| Heads | 8 |
| d_model | 512 |
| d_head | 64 |
| d_mlp (SwiGLU) | 1408 |
| Vocab | 8192 (custom BPE) |
| Context length | 512 |
| Norm | RMSNorm (ฮต = 1e-6) |
| Position | RoPE (half-half, base 10,000) |
| Activation | SwiGLU |
| Biases | none |
| Tied input/output embeddings | yes |
| Training tokens | ~25k steps on task-decomposition corpus |
What is this model for?
Given a task prompt, the model writes a step-by-step decomposition. The standard variant was trained on normal task decompositions (tasks โ subtasks in straightforward order). The ADHD counterpart was trained on decompositions with smaller steps and interleaved micro-regulation actions (e.g. "sip water", "deep breath", "quick stretch").
The pair is the subject of a mechanistic-interpretability study. Phase 1 headline findings:
- Structural head-position swap. A step-layout-broadcast head lives at L3H0 in the standard model and at L3H5 in the ADHD model. Cross-model per-position attention profile cosine similarity is 0.997 at the matched (different-index) pair vs a same-index baseline of 0.66.
- Block-2 content circuit. P(regulation token) at step-onset positions jumps 17ร between layer 1 and layer 2 in the ADHD model (0.014 โ 0.251); the standard model never crosses 1% at any layer.
- High-specificity null-steering SAE feature. See the companion SAE repo
connaaa/interpgpt-sae-phase5.
Input format
<|task|>Clean the kitchen<|steps|>Step 1 text<|sep|>Step 2 text<|sep|>...<|end|>
Loading
HuggingFace Transformers (custom code)
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained(
"connaaa/interpgpt-standard-23M", trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"connaaa/interpgpt-standard-23M"
)
TransformerLens (recommended for interpretability)
The repo ships a TransformerLens-compatible bundle at hooked_transformer.pt:
from huggingface_hub import hf_hub_download
from transformer_lens import HookedTransformer, HookedTransformerConfig
import torch
path = hf_hub_download(
"connaaa/interpgpt-standard-23M", "hooked_transformer.pt"
)
blob = torch.load(path, map_location="cpu", weights_only=False)
cfg_keep = {
k: v for k, v in blob["config"].items()
if k in HookedTransformerConfig.__dataclass_fields__ and not (
isinstance(v, str) and v.startswith("torch.")
)
}
cfg = HookedTransformerConfig(**cfg_keep)
model = HookedTransformer(cfg)
model.load_state_dict(blob["model_state_dict"])
model.eval()
Raw PyTorch / original TaskGPT class
# Pairs with gpt_model.py from https://github.com/cwklurks/interpgpt
from huggingface_hub import hf_hub_download
from gpt_model import GPTConfig, TaskGPT
import torch
path = hf_hub_download(
"connaaa/interpgpt-standard-23M", "pytorch_model.pt"
)
blob = torch.load(path, map_location="cpu", weights_only=False)
model = TaskGPT(GPTConfig(**blob["config"]))
model.load_state_dict(blob["model_state_dict"])
Reproduce the head-swap finding
Open the companion Colab:
notebooks/InterpGPT_HeadSwap.ipynb at
github.com/cwklurks/interpgpt.
End-to-end run on Colab free tier reproduces the 0.997 vs 0.66 comparison
in under 15 minutes.
Training data
Custom task-decomposition corpus, two variants (standard vs ADHD) generated
with the same task pool. Detailed dataset notes + generation scripts live in
the main repo (preprocess.py, merge_data.py, rebuild_data.py,
fix_adhd_data.py, shorten_adhd_steps.py).
License
MIT.
Intended use
Interpretability research. The model is intentionally small and domain-specific; not intended as a general-purpose chatbot.
Citation
@misc{interpgpt2026,
title = {{InterpGPT}: A matched-pair interpretability study of task-decomposition models},
author = {Klann, Connor},
year = {2026},
url = {https://github.com/cwklurks/interpgpt}
}
- Downloads last month
- 347