YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Structural FFN Decomposition Guides Cross-Model Compression and Quantization

Artifacts for the paper by Yeonseong Cynn (River Lab, May 2026).

Summary

Decomposes transformer FFN layers into structural (format-preserving) and classification-relevant components across BERT and GPT-2.

Key findings:

  • Early-layer FFN is 90-200x more structural than classification-relevant; late layers approach 1:1
  • Structural pruning: head + FFN neuron removal with layer-wise retraining achieves 19.1% parameter reduction on BERT (SST-2) and 9.1% on GPT-2 with no accuracy loss
  • Neuron pruning: removing 8% rarely-active FFN neurons improves BERT accuracy by 0.3%
  • Mixed-precision quantization: INT4 on structurally-dominant layers (L1-L3) with STE retraining recovers to -2.1% loss

Files

Weights

  • bert_sst2_int4_ste.pt β€” BERT SST-2 with L1-L3 INT4 quantization + STE retraining. Standard BERT state_dict, loadable directly. Accuracy: 90.1% (original FP32: 92.4%).

Results β€” BERT (results/bert/)

  • bert_structural_prune.json β€” Per-layer structural pruning results (head/FFN reduction, accuracy)
  • bert_sst2_all_prune.json β€” All-layer simultaneous FFN pruning results
  • bert_l8_prune_results.json β€” L8 FFN correction + pruning (multi-seed)
  • bert_quantize_results.json β€” INT4/INT8 post-training quantization results
  • bert_quantize_retrain.json β€” INT4 STE retraining results

Results β€” GPT-2 (results/gpt2/)

  • gpt2_structural_prune.json β€” Per-layer structural pruning (head + FFN)
  • gpt2_each_layer_prune.json β€” Individual layer compression results
  • gpt2_prune_validate.json β€” Pruning validation (PPL, accuracy)

Figures

  • figures/fig1_ratio.png β€” FFN dual role ratio: BERT vs GPT-2 (log scale)
  • figures/fig2_compression.png β€” Per-layer compression rates comparison
  • figures/fig3_pruning.png β€” BERT SST-2 FFN neuron pruning curve
  • figures/fig4_quantization.png β€” INT4 quantization results (PTQ vs STE)

Base Models

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support