Text Generation
Transformers
Safetensors
interpgpt
interpretability
mechanistic-interpretability
task-decomposition
small-language-model
transformer-lens
custom_code
Instructions to use connaaa/interpgpt-standard-23M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use connaaa/interpgpt-standard-23M with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="connaaa/interpgpt-standard-23M", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("connaaa/interpgpt-standard-23M", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use connaaa/interpgpt-standard-23M with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "connaaa/interpgpt-standard-23M" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "connaaa/interpgpt-standard-23M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/connaaa/interpgpt-standard-23M
- SGLang
How to use connaaa/interpgpt-standard-23M with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "connaaa/interpgpt-standard-23M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "connaaa/interpgpt-standard-23M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "connaaa/interpgpt-standard-23M" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "connaaa/interpgpt-standard-23M", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use connaaa/interpgpt-standard-23M with Docker Model Runner:
docker model run hf.co/connaaa/interpgpt-standard-23M
| license: mit | |
| library_name: transformers | |
| tags: | |
| - interpretability | |
| - mechanistic-interpretability | |
| - task-decomposition | |
| - small-language-model | |
| - transformer-lens | |
| pipeline_tag: text-generation | |
| # InterpGPT — Standard Model (23M) | |
| Part of the **InterpGPT** matched-pair release. This is the **standard** model; | |
| its counterpart is [`connaaa/interpgpt-adhd-23M`](https://huggingface.co/connaaa/interpgpt-adhd-23M). | |
| Both models share identical architecture and training recipe; only the training | |
| data distribution differs. | |
| | | Value | | |
| |---|---| | |
| | Parameters | 23,471,104 | | |
| | Layers | 6 | | |
| | Heads | 8 | | |
| | d_model | 512 | | |
| | d_head | 64 | | |
| | d_mlp (SwiGLU) | 1408 | | |
| | Vocab | 8192 (custom BPE) | | |
| | Context length | 512 | | |
| | Norm | RMSNorm (ε = 1e-6) | | |
| | Position | RoPE (half-half, base 10,000) | | |
| | Activation | SwiGLU | | |
| | Biases | none | | |
| | Tied input/output embeddings | yes | | |
| | Training tokens | ~25k steps on task-decomposition corpus | | |
| ## What is this model for? | |
| Given a task prompt, the model writes a step-by-step decomposition. The | |
| **standard** variant was trained on normal task decompositions (tasks → subtasks | |
| in straightforward order). The **ADHD** counterpart was trained on decompositions | |
| with smaller steps and interleaved micro-regulation actions (e.g. "sip water", | |
| "deep breath", "quick stretch"). | |
| The pair is the subject of a mechanistic-interpretability study. | |
| Phase 1 headline findings: | |
| - **Structural head-position swap.** A step-layout-broadcast head lives at | |
| **L3H0** in the standard model and at **L3H5** in the ADHD model. | |
| Cross-model per-position attention profile cosine similarity is **0.997** | |
| at the matched (different-index) pair vs a same-index baseline of **0.66**. | |
| - **Block-2 content circuit.** P(regulation token) at step-onset positions jumps | |
| 17× between layer 1 and layer 2 in the ADHD model (0.014 → 0.251); the | |
| standard model never crosses 1% at any layer. | |
| - **High-specificity null-steering SAE feature.** See the companion SAE repo | |
| [`connaaa/interpgpt-sae-phase5`](https://huggingface.co/connaaa/interpgpt-sae-phase5). | |
| ## Input format | |
| ``` | |
| <|task|>Clean the kitchen<|steps|>Step 1 text<|sep|>Step 2 text<|sep|>...<|end|> | |
| ``` | |
| ## Loading | |
| ### HuggingFace Transformers (custom code) | |
| ```python | |
| from transformers import AutoModel, AutoTokenizer | |
| model = AutoModel.from_pretrained( | |
| "connaaa/interpgpt-standard-23M", trust_remote_code=True | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| "connaaa/interpgpt-standard-23M" | |
| ) | |
| ``` | |
| ### TransformerLens (recommended for interpretability) | |
| The repo ships a TransformerLens-compatible bundle at `hooked_transformer.pt`: | |
| ```python | |
| from huggingface_hub import hf_hub_download | |
| from transformer_lens import HookedTransformer, HookedTransformerConfig | |
| import torch | |
| path = hf_hub_download( | |
| "connaaa/interpgpt-standard-23M", "hooked_transformer.pt" | |
| ) | |
| blob = torch.load(path, map_location="cpu", weights_only=False) | |
| cfg_keep = { | |
| k: v for k, v in blob["config"].items() | |
| if k in HookedTransformerConfig.__dataclass_fields__ and not ( | |
| isinstance(v, str) and v.startswith("torch.") | |
| ) | |
| } | |
| cfg = HookedTransformerConfig(**cfg_keep) | |
| model = HookedTransformer(cfg) | |
| model.load_state_dict(blob["model_state_dict"]) | |
| model.eval() | |
| ``` | |
| ### Raw PyTorch / original TaskGPT class | |
| ```python | |
| # Pairs with gpt_model.py from https://github.com/cwklurks/interpgpt | |
| from huggingface_hub import hf_hub_download | |
| from gpt_model import GPTConfig, TaskGPT | |
| import torch | |
| path = hf_hub_download( | |
| "connaaa/interpgpt-standard-23M", "pytorch_model.pt" | |
| ) | |
| blob = torch.load(path, map_location="cpu", weights_only=False) | |
| model = TaskGPT(GPTConfig(**blob["config"])) | |
| model.load_state_dict(blob["model_state_dict"]) | |
| ``` | |
| ## Reproduce the head-swap finding | |
| Open the companion Colab: | |
| **`notebooks/InterpGPT_HeadSwap.ipynb`** at | |
| [github.com/cwklurks/interpgpt](https://github.com/cwklurks/interpgpt). | |
| End-to-end run on Colab free tier reproduces the 0.997 vs 0.66 comparison | |
| in under 15 minutes. | |
| ## Training data | |
| Custom task-decomposition corpus, two variants (standard vs ADHD) generated | |
| with the same task pool. Detailed dataset notes + generation scripts live in | |
| the main repo (`preprocess.py`, `merge_data.py`, `rebuild_data.py`, | |
| `fix_adhd_data.py`, `shorten_adhd_steps.py`). | |
| ## License | |
| MIT. | |
| ## Intended use | |
| Interpretability research. The model is intentionally small and | |
| domain-specific; **not** intended as a general-purpose chatbot. | |
| ## Citation | |
| ```bibtex | |
| @misc{interpgpt2026, | |
| title = {{InterpGPT}: A matched-pair interpretability study of task-decomposition models}, | |
| author = {Klann, Connor}, | |
| year = {2026}, | |
| url = {https://github.com/cwklurks/interpgpt} | |
| } | |
| ``` | |