ESPR3SS0
/

neural-pruning-impl

ml-intern

Model card Files Files and versions

xet

Community

ESPR3SS0 commited on 13 days ago

Commit

ca31dfe

verified ·

1 Parent(s): 4d76378

Add metapruning/README.md

Browse files

Files changed (1) hide show

metapruning/README.md +135 -0

metapruning/README.md ADDED Viewed

	@@ -0,0 +1,135 @@

+# MetaPruning: Meta Pruning via Graph Metanetworks
+Implementation of **"Meta Pruning via Graph Metanetworks: A Universal Meta-Learning Framework for Network Pruning"**
+**Paper:** https://arxiv.org/abs/2506.12041
+## Core Idea
+Unlike prior "learning to prune" methods that train per-model, MetaPruning **trains once, prunes forever**. A **GNN metanetwork** takes an entire neural network as input (converted to a graph where neurons = nodes, weights = edges), and outputs a transformed version that is easier to prune. After one feedforward pass + standard finetuning, you get SOTA pruning — no per-model special training.
+## Architecture
+### Network ↔ Graph Bijection
+```
+Network Graph
+  Node  = each output channel/neuron in Conv/Linear layers
+  Edge  = connection between channels (conv, linear, residual skip)
+  Node feature = [weight_mean, weight_std, BN_weight, BN_bias, BN_run_mean, BN_run_var]
+  Edge feature = flattened conv kernel (padded to uniform size)
+```
+### GNN Metanetwork (Appendix B.1)
+```
+Node/Edge Encoder (MLP) → hidden_dim
+        ↓
+N × PNAConv Message Passing Layers
+  • Message:  m_ij = MLP^1(v_i) ⊙ MLP^2(v_j) ⊙ e_ij
+  • Message': m'_ji = MLP^1(v_j) ⊙ MLP^2(v_i) ⊙ (e_ij ⊙ EdgeInvertor)
+  • Aggregation: PNA([MEAN, STD, MAX, MIN]) via MLP_Aggr
+  • Update: v_i += aggr(m_ij) + aggr(m'_ji)
+  • Edge: e_ij += MLP^1(v_i) ⊙ MLP^2(v_j) ⊙ e_ij + ...
+        ↓
+Node/Edge Decoder (MLP) → original dims
+        ↓
+Residual: v_out = α·v_pred + v_in,  e_out = β·e_pred + e_in
+          (α = β = 0.01, learns deltas only)
+```
+### Meta-Training Loop
+1. Select a **data model** (pre-trained network)
+2. Convert network → graph
+3. Feed graph through GNN metanetwork → transformed graph
+4. Convert transformed graph → new network
+5. Compute **accuracy loss** (subset of training data) + **sparsity loss** (L1 weight penalty)
+6. Backprop through metanetwork (data model params frozen)
+### Inference Pipeline
+```
+Target Model → Graph → [Metanetwork] → Transformed Graph → New Model
+                          ↓
+                    Finetune (standard SGD, 100-200 epochs)
+                          ↓
+                    Prune (DepGraph / magnitude criterion)
+                          ↓
+                    Pruned Model (ready for inference)
+```
+## Files
+| File | Description |
+|------|-------------|
+| `graph.py` | Network ↔ Graph bijection (`resnet_to_graph`, `graph_to_resnet`, `create_transformed_model`) |
+| `gnn.py` | PNAConv GNN (`Metanetwork`, `PNAConvLayer`, `EdgeInvertor`) |
+| `train_metanetwork.py` | Meta-training loop on CIFAR-10 |
+| `inference.py` | Inference: metanetwork → finetune → prune → evaluate |
+## Usage
+### 1. Meta-Train the Metanetwork
+```bash
+python -m metapruning.train_metanetwork \
+    --meta_epochs 100 \
+    --hidden_dim 32 \
+    --num_layers 3 \
+    --alpha 0.01 \
+    --beta 0.01 \
+    --lr 1e-3 \
+    --weight_decay 5e-4 \
+    --pruner_reg 10.0 \
+    --num_data_models 1 \
+    --pretrain_data_models \
+    --pretrain_epochs 100
+```
+This creates `checkpoints_metapruning/metanetwork.pt`.
+### 2. Prune Any Target Model
+```bash
+python -m metapruning.inference \
+    --metanetwork_path checkpoints_metapruning/metanetwork.pt \
+    --target_model resnet56 \
+    --finetune_epochs 100 \
+    --prune_sparsity 0.5 \
+    --lr 0.01
+```
+## Paper Results
+| Task | Base Acc | Pruned Acc (Δ) | Pruned FLOPs |
+|------|----------|----------------|--------------|
+| ResNet56 / CIFAR-10 | 93.51% | **93.64%** (+0.13%) | 65.6% |
+| VGG19 / CIFAR-100 | 73.65% | **69.75%** (−3.90%) | 88.83% |
+| ResNet50 / ImageNet | 76.14% | **76.13%** (−0.01%) | 57.2% |
+## Key Properties
+- **Transferable**: Metanetwork trained on ResNet56/CIFAR-10 → prunes ResNet110, VGG, ViT on different datasets
+- **One-shot pruning**: Single metanetwork feedforward + finetuning, no iterative pruning
+- **Universal**: Applies to any CNN or ViT via graph bijection
+- **Low cost (amortized)**: Expensive meta-training once, cheap pruning forever
+## Hyperparameters
+| Param | Default | Description |
+|-------|---------|-------------|
+| `hidden_dim` | 32 | GNN hidden dimension |
+| `num_layers` | 3 | Message passing layers |
+| `alpha` | 0.01 | Node feature residual coefficient |
+| `beta` | 0.01 | Edge feature residual coefficient |
+| `meta_epochs` | 100 | Meta-training epochs |
+| `lr` | 1e-3 | Metanetwork learning rate |
+| `weight_decay` | 5e-4 | AdamW weight decay |
+| `pruner_reg` | 10.0 | Sparsity loss weight |
+## Notes
+- **Graph→Model differentiability**: The current `graph_to_resnet` uses in-place data modification (`module.weight.data += delta`). For fully differentiable meta-training, this should construct new `nn.Parameter` objects from GNN outputs instead. The current implementation demonstrates the architecture; production use should use `torch.autograd.Function` or `torch.nn.Parameter` construction for end-to-end differentiability.
+- **For proper pruning**: Use `torch_pruning` (DepGraph) for structural pruning with dependency groups. The inference script includes a simple magnitude-based channel pruner as a placeholder.
+- **Full paper**: Appendix B.1 contains the complete GNN architecture equations, and Appendix D contains per-task hyperparameters.