leoncynn's picture
Upload folder using huggingface_hub
25567c0 verified
# Structural FFN Decomposition Guides Cross-Model Compression and Quantization
Artifacts for the paper by Yeonseong Cynn (River Lab, May 2026).
## Summary
Decomposes transformer FFN layers into structural (format-preserving) and classification-relevant components across BERT and GPT-2.
Key findings:
- Early-layer FFN is 90-200x more structural than classification-relevant; late layers approach 1:1
- **Structural pruning**: head + FFN neuron removal with layer-wise retraining achieves 19.1% parameter reduction on BERT (SST-2) and 9.1% on GPT-2 with no accuracy loss
- **Neuron pruning**: removing 8% rarely-active FFN neurons *improves* BERT accuracy by 0.3%
- **Mixed-precision quantization**: INT4 on structurally-dominant layers (L1-L3) with STE retraining recovers to -2.1% loss
## Files
### Weights
- `bert_sst2_int4_ste.pt` β€” BERT SST-2 with L1-L3 INT4 quantization + STE retraining. Standard BERT state_dict, loadable directly. Accuracy: 90.1% (original FP32: 92.4%).
### Results β€” BERT (`results/bert/`)
- `bert_structural_prune.json` β€” Per-layer structural pruning results (head/FFN reduction, accuracy)
- `bert_sst2_all_prune.json` β€” All-layer simultaneous FFN pruning results
- `bert_l8_prune_results.json` β€” L8 FFN correction + pruning (multi-seed)
- `bert_quantize_results.json` β€” INT4/INT8 post-training quantization results
- `bert_quantize_retrain.json` β€” INT4 STE retraining results
### Results β€” GPT-2 (`results/gpt2/`)
- `gpt2_structural_prune.json` β€” Per-layer structural pruning (head + FFN)
- `gpt2_each_layer_prune.json` β€” Individual layer compression results
- `gpt2_prune_validate.json` β€” Pruning validation (PPL, accuracy)
### Figures
- `figures/fig1_ratio.png` β€” FFN dual role ratio: BERT vs GPT-2 (log scale)
- `figures/fig2_compression.png` β€” Per-layer compression rates comparison
- `figures/fig3_pruning.png` β€” BERT SST-2 FFN neuron pruning curve
- `figures/fig4_quantization.png` β€” INT4 quantization results (PTQ vs STE)
## Base Models
- BERT: [textattack/bert-base-uncased-SST-2](https://huggingface.co/textattack/bert-base-uncased-SST-2)
- GPT-2: [gpt2](https://huggingface.co/gpt2) (124M, pre-trained)
## License
MIT