bergson MAGIC checkpoint — GPT-2 fine-tuned on wikitext-2

GPT-2 (124M) fine-tuned on Salesforce/wikitext wikitext-2-raw-v1 train (chunked at 512 tokens) via the bergson MAGIC pipeline. This is the exact checkpoint used to generate the attribution scores published at EleutherAI/bergson-magic-scores-gpt-2.

Loading

from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("EleutherAI/bergson-magic-gpt-2")
tokenizer = AutoTokenizer.from_pretrained("EleutherAI/bergson-magic-gpt-2")

YAML used to produce this checkpoint

run_path: runs/gpt2_wikitext
model: gpt2
overwrite: true

data:
  dataset: Salesforce/wikitext
  subset: wikitext-2-raw-v1
  split: "train"
  chunk_length: 512

query:
  dataset: Salesforce/wikitext
  subset: wikitext-2-raw-v1
  split: "test[3:4]"
  chunk_length: 0

distributed:
  nproc_per_node: 4
  nnode: 4

batch_size: 256
num_epochs: 2
lr_schedule:
  lr_scheduler_type: polynomial
  lr: 0.0008
  lr_start: 1e-6
  lr_end: 0.00008
  warmup_steps: 0.25

subset_strategy: random
wandb_project: magic

Saved as examples/magic/gpt2_wikitext.yaml in the bergson repo.

Run with:

bergson magic examples/magic/gpt2_wikitext.yaml

The bergson magic step trains the model on the train split via its own training loop (it must, because MAGIC's attribution scores are the gradients of query loss with respect to per-example training weights, computed by back-propagating through training). The final trained weights end up at the hf_model/ subdirectory of the run path; that's what was uploaded here.

Downloads last month
-
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for EleutherAI/bergson-magic-gpt-2

Finetuned
(2143)
this model

Dataset used to train EleutherAI/bergson-magic-gpt-2