YAML Metadata Warning:empty or missing yaml metadata in repo card
Check out the documentation for more information.
Structural FFN Decomposition Guides Cross-Model Compression and Quantization
Artifacts for the paper by Yeonseong Cynn (River Lab, May 2026).
Summary
Decomposes transformer FFN layers into structural (format-preserving) and classification-relevant components across BERT and GPT-2.
Key findings:
- Early-layer FFN is 90-200x more structural than classification-relevant; late layers approach 1:1
- Structural pruning: head + FFN neuron removal with layer-wise retraining achieves 19.1% parameter reduction on BERT (SST-2) and 9.1% on GPT-2 with no accuracy loss
- Neuron pruning: removing 8% rarely-active FFN neurons improves BERT accuracy by 0.3%
- Mixed-precision quantization: INT4 on structurally-dominant layers (L1-L3) with STE retraining recovers to -2.1% loss
Files
Weights
bert_sst2_int4_ste.ptβ BERT SST-2 with L1-L3 INT4 quantization + STE retraining. Standard BERT state_dict, loadable directly. Accuracy: 90.1% (original FP32: 92.4%).
Results β BERT (results/bert/)
bert_structural_prune.jsonβ Per-layer structural pruning results (head/FFN reduction, accuracy)bert_sst2_all_prune.jsonβ All-layer simultaneous FFN pruning resultsbert_l8_prune_results.jsonβ L8 FFN correction + pruning (multi-seed)bert_quantize_results.jsonβ INT4/INT8 post-training quantization resultsbert_quantize_retrain.jsonβ INT4 STE retraining results
Results β GPT-2 (results/gpt2/)
gpt2_structural_prune.jsonβ Per-layer structural pruning (head + FFN)gpt2_each_layer_prune.jsonβ Individual layer compression resultsgpt2_prune_validate.jsonβ Pruning validation (PPL, accuracy)
Figures
figures/fig1_ratio.pngβ FFN dual role ratio: BERT vs GPT-2 (log scale)figures/fig2_compression.pngβ Per-layer compression rates comparisonfigures/fig3_pruning.pngβ BERT SST-2 FFN neuron pruning curvefigures/fig4_quantization.pngβ INT4 quantization results (PTQ vs STE)
Base Models
- BERT: textattack/bert-base-uncased-SST-2
- GPT-2: gpt2 (124M, pre-trained)
License
MIT
Inference Providers NEW
This model isn't deployed by any Inference Provider. π Ask for provider support