| # Structural FFN Decomposition Guides Cross-Model Compression and Quantization |
|
|
| Artifacts for the paper by Yeonseong Cynn (River Lab, May 2026). |
|
|
| ## Summary |
|
|
| Decomposes transformer FFN layers into structural (format-preserving) and classification-relevant components across BERT and GPT-2. |
|
|
| Key findings: |
| - Early-layer FFN is 90-200x more structural than classification-relevant; late layers approach 1:1 |
| - **Structural pruning**: head + FFN neuron removal with layer-wise retraining achieves 19.1% parameter reduction on BERT (SST-2) and 9.1% on GPT-2 with no accuracy loss |
| - **Neuron pruning**: removing 8% rarely-active FFN neurons *improves* BERT accuracy by 0.3% |
| - **Mixed-precision quantization**: INT4 on structurally-dominant layers (L1-L3) with STE retraining recovers to -2.1% loss |
|
|
| ## Files |
|
|
| ### Weights |
| - `bert_sst2_int4_ste.pt` β BERT SST-2 with L1-L3 INT4 quantization + STE retraining. Standard BERT state_dict, loadable directly. Accuracy: 90.1% (original FP32: 92.4%). |
| |
| ### Results β BERT (`results/bert/`) |
| - `bert_structural_prune.json` β Per-layer structural pruning results (head/FFN reduction, accuracy) |
| - `bert_sst2_all_prune.json` β All-layer simultaneous FFN pruning results |
| - `bert_l8_prune_results.json` β L8 FFN correction + pruning (multi-seed) |
| - `bert_quantize_results.json` β INT4/INT8 post-training quantization results |
| - `bert_quantize_retrain.json` β INT4 STE retraining results |
|
|
| ### Results β GPT-2 (`results/gpt2/`) |
| - `gpt2_structural_prune.json` β Per-layer structural pruning (head + FFN) |
| - `gpt2_each_layer_prune.json` β Individual layer compression results |
| - `gpt2_prune_validate.json` β Pruning validation (PPL, accuracy) |
|
|
| ### Figures |
| - `figures/fig1_ratio.png` β FFN dual role ratio: BERT vs GPT-2 (log scale) |
| - `figures/fig2_compression.png` β Per-layer compression rates comparison |
| - `figures/fig3_pruning.png` β BERT SST-2 FFN neuron pruning curve |
| - `figures/fig4_quantization.png` β INT4 quantization results (PTQ vs STE) |
|
|
| ## Base Models |
|
|
| - BERT: [textattack/bert-base-uncased-SST-2](https://huggingface.co/textattack/bert-base-uncased-SST-2) |
| - GPT-2: [gpt2](https://huggingface.co/gpt2) (124M, pre-trained) |
|
|
| ## License |
|
|
| MIT |
|
|