BLOOM-1b7 Head Surgery Checkpoints

Surgical reinitialization of collapsed attention heads in BLOOM-1b7. These checkpoints accompany the paper "Surgical Repair of Collapsed Attention Heads in ALiBi Transformers" by Palmer Schallon.

Paper and code: github.com/Palmerschallon/bloom-head-surgery

What This Is

ALiBi positional encoding causes 31-44% of attention heads in BLOOM models to collapse onto the BOS token. We surgically reinitialize collapsed heads (Xavier Q/K/V reinit + zeroed output + gradient masks on frozen params) and retrain, recovering 98.7% of attention head capacity.

Checkpoints

Directory Description Healthy Heads Training PPL
pass2_e1/ Final surgical model (2-pass band + outlier surgery) 379/384 (98.7%) 15.10
c4_baseline_e3/ Control: same surgery, C4 corpus 341/384 (88.8%) 20.80
h5_step42/ Extended surgery: band + healthy H5 column, sub-epoch best 355+/384 12.70
pass1_e3/ Intermediate: band-only surgery 341/384 (88.8%) 15.13

Stock BLOOM-1b7 baseline: 242/384 healthy heads (63.0%), training PPL 16.99.

Key Finding

The H5 checkpoint (h5_step42/) reinitializes mostly-healthy heads alongside collapsed ones and achieves 25% lower perplexity than stock (12.70 vs 16.99). This demonstrates that pretrained attention configurations are local minima, not global optima.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load the final surgical model
model = AutoModelForCausalLM.from_pretrained("TheNexus42/bloom-1b7-head-surgery", subfolder="pass2_e1")
tokenizer = AutoTokenizer.from_pretrained("TheNexus42/bloom-1b7-head-surgery", subfolder="pass2_e1")

# Or load the H5 best-PPL model
model = AutoModelForCausalLM.from_pretrained("TheNexus42/bloom-1b7-head-surgery", subfolder="h5_step42")
tokenizer = AutoTokenizer.from_pretrained("TheNexus42/bloom-1b7-head-surgery", subfolder="h5_step42")

Training Details

  • Hardware: Single NVIDIA RTX 5070 Ti (16GB VRAM)
  • Precision: bfloat16
  • Optimizer: AdamW, LR 5e-5, no weight decay
  • Sequence length: 512 tokens
  • Batch: 1 with gradient accumulation over 8 steps

Provenance

Each checkpoint directory includes trajectory.json with per-epoch metrics (head counts, PPL, BOS mass per head). The evaluation/ directory contains full diagnostic outputs and generation completions.

Citation

@misc{schallon2026surgical,
  title={Surgical Repair of Collapsed Attention Heads in ALiBi Transformers},
  author={Schallon, Palmer},
  year={2026},
  howpublished={\url{https://github.com/Palmerschallon/bloom-head-surgery}}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TheNexus42/bloom-1b7-head-surgery

Finetuned
(12)
this model

Dataset used to train TheNexus42/bloom-1b7-head-surgery