BLOOM-1b7 Head Surgery Checkpoints

Surgical reinitialization of collapsed attention heads in BLOOM-1b7. These checkpoints accompany the paper "Surgical Repair of Collapsed Attention Heads in ALiBi Transformers" by Palmer Schallon.

Paper and code: github.com/Palmerschallon/bloom-head-surgery

What This Is

ALiBi positional encoding causes 31-44% of attention heads in BLOOM models to collapse onto the BOS token. We surgically reinitialize collapsed heads (Xavier Q/K/V reinit + zeroed output + gradient masks on frozen params) and retrain, recovering 98.7% of attention head capacity.

Checkpoints

Directory	Description	Healthy Heads	Training PPL
`pass2_e1/`	Final surgical model (2-pass band + outlier surgery)	379/384 (98.7%)	15.10
`c4_baseline_e3/`	Control: same surgery, C4 corpus	341/384 (88.8%)	20.80
`h5_step42/`	Extended surgery: band + healthy H5 column, sub-epoch best	355+/384	12.70
`pass1_e3/`	Intermediate: band-only surgery	341/384 (88.8%)	15.13

Stock BLOOM-1b7 baseline: 242/384 healthy heads (63.0%), training PPL 16.99.

Key Finding

The H5 checkpoint (h5_step42/) reinitializes mostly-healthy heads alongside collapsed ones and achieves 25% lower perplexity than stock (12.70 vs 16.99). This demonstrates that pretrained attention configurations are local minima, not global optima.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the final surgical model
model = AutoModelForCausalLM.from_pretrained("TheNexus42/bloom-1b7-head-surgery", subfolder="pass2_e1")
tokenizer = AutoTokenizer.from_pretrained("TheNexus42/bloom-1b7-head-surgery", subfolder="pass2_e1")

# Or load the H5 best-PPL model
model = AutoModelForCausalLM.from_pretrained("TheNexus42/bloom-1b7-head-surgery", subfolder="h5_step42")
tokenizer = AutoTokenizer.from_pretrained("TheNexus42/bloom-1b7-head-surgery", subfolder="h5_step42")

Training Details

Hardware: Single NVIDIA RTX 5070 Ti (16GB VRAM)
Precision: bfloat16
Optimizer: AdamW, LR 5e-5, no weight decay
Sequence length: 512 tokens
Batch: 1 with gradient accumulation over 8 steps

Provenance

Each checkpoint directory includes trajectory.json with per-epoch metrics (head counts, PPL, BOS mass per head). The evaluation/ directory contains full diagnostic outputs and generation completions.

Citation

@misc{schallon2026surgical,
  title={Surgical Repair of Collapsed Attention Heads in ALiBi Transformers},
  author={Schallon, Palmer},
  year={2026},
  howpublished={\url{https://github.com/Palmerschallon/bloom-head-surgery}}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for TheNexus42/bloom-1b7-head-surgery

Base model

bigscience/bloom-1b7

Finetuned

(12)

this model

TheNexus42
/

bloom-1b7-head-surgery