phanerozoic commited on
Commit
b7adf0a
Β·
verified Β·
1 Parent(s): dffdd0f

Add per-parameter pruning rows + joint pruning result. R1 remains leading Pareto point.

Browse files
Files changed (1) hide show
  1. README.md +19 -3
README.md CHANGED
@@ -170,8 +170,14 @@ Cofiber Threshold variants trained on full COCO 2017 train (117,266 images). Fro
170
  | box32 pruned R2 | 768β†’32β†’4 | 91,640 | ~62,000 | **5.9** | 20.4 | **1.5** |
171
  | box32 pruned R3 | 768β†’32β†’4 | 91,640 | ~47,000 | 5.1 | 17.1 | 1.4 |
172
  | **dim20** | **768β†’20β†’80 cls, 20β†’16β†’4 reg** | **22,076** | **22,076** | **3.9** | **14.8** | 0.9 |
173
- | **dim20 pruned R1** | **768β†’20β†’80 cls (project 25.7% sparse)** | **22,076** | **18,121** | **3.9** | **14.6** | 0.8 |
174
- | **dim20 pruned R2** | **768β†’20β†’80 cls (project 26.6% sparse)** | **22,076** | **17,988** | **3.8** | **14.5** | 0.7 |
 
 
 
 
 
 
175
  | **dim15** | **768β†’15β†’80 cls, 15β†’16β†’4 reg** | **17,751** | **17,751** | **3.0** | **11.5** | 0.7 |
176
  | **dim10** | **768β†’10β†’80 cls, 10β†’16β†’4 reg** | **13,426** | **13,426** | **1.5** | **5.6** | 0.4 |
177
  | **dim5** | **768β†’5β†’80 cls, 5β†’16β†’4 reg** | **9,101** | **9,101** | **0.3** | **1.3** | 0.1 |
@@ -180,7 +186,17 @@ Pruning improved mAP from 5.7 to 5.9 at R2 (~62K nonzero) by removing noisy prot
180
 
181
  The dim15, dim10, and dim5 variants push the bottleneck further with the same SVD-initialization recipe applied to the top 15, top 10, and top 5 directions of the pruned R2 prototype matrix. Dim15 (17,751 parameters, 67% SVD energy retention) reaches 3.0 mAP. Dim10 (13,426 parameters, 61% energy retention) reaches 1.5 mAP β€” the smallest 80-class COCO detector to clear the 1.0 mAP threshold. Dim5 (9,101 parameters, 53% energy retention) drops to 0.3 mAP. The mAP scaling across dim20 β†’ dim15 β†’ dim10 is roughly geometric (3.9 β†’ 3.0 β†’ 1.5), but reverses sharply between 10 and 5 dimensions where the curve falls off a cliff. Five directions sit below the intrinsic capacity needed for 80-class separation; the floor lies between 5 and 10 bottleneck dimensions, and finer probes at dim7/dim8 would localize the exact cliff.
182
 
183
- Dim20 was then itself put through magnitude pruning. The rigorous mAP-driven pruner bisected over the 15,360 weights of the projection layer using full pycocotools mAP as the retention metric and a corrective per-pass rollback. It found that 4,088 of those weights (26.6%) can be zeroed before mAP@[0.5:0.95] drops below the 95% retention floor. The Pareto front has two points: R1 zeros 3,955 weights (25.7%) for **3.9 mAP at 18,121 nonzero head parameters**, and R2 zeros 4,088 weights (26.6%) for **3.8 mAP at 17,988 nonzero head parameters**. R1 reaches 2.15 mAP per 10K parameters β€” the most parameter-efficient detector in the table, beating both unpruned dim20 (1.77) and NanoDet-m-0.5x (1.44).
 
 
 
 
 
 
 
 
 
 
184
 
185
  #### Training recipe
186
 
 
170
  | box32 pruned R2 | 768β†’32β†’4 | 91,640 | ~62,000 | **5.9** | 20.4 | **1.5** |
171
  | box32 pruned R3 | 768β†’32β†’4 | 91,640 | ~47,000 | 5.1 | 17.1 | 1.4 |
172
  | **dim20** | **768β†’20β†’80 cls, 20β†’16β†’4 reg** | **22,076** | **22,076** | **3.9** | **14.8** | 0.9 |
173
+ | **dim20 R1** (project 25.7% sparse) | 768β†’20β†’80 cls | 22,076 | 18,121 | **3.9** | 14.6 | 0.8 |
174
+ | **dim20 R2** (project 26.6% sparse) | 768β†’20β†’80 cls | 22,076 | 17,988 | 3.8 | 14.5 | 0.7 |
175
+ | dim20 cls_weight pruned (37%) | 596 of 1600 cls weights zeroed | 22,076 | 21,480 | 3.8 | 14.4 | 0.7 |
176
+ | dim20 reg_hidden pruned (17%) | 55 of 320 reg weights zeroed | 22,076 | 22,021 | 3.8 | 14.5 | 0.7 |
177
+ | dim20 reg_out pruned (12%) | 8 of 64 reg weights zeroed | 22,076 | 22,068 | 3.8 | 14.5 | 0.7 |
178
+ | dim20 ctr_weight pruned (90%) | 18 of 20 ctr weights zeroed | 22,076 | 22,058 | 3.7 | 14.2 | 0.7 |
179
+ | dim20 R1 + cls greedy | project 25.7% + cls 45% sparse | 22,076 | 17,406 | 3.5 | 13.4 | 0.6 |
180
+ | dim20 joint (from R1) | whole-head magnitude pruning | 22,076 | 17,129 | 3.6 | 13.7 | 0.6 |
181
  | **dim15** | **768β†’15β†’80 cls, 15β†’16β†’4 reg** | **17,751** | **17,751** | **3.0** | **11.5** | 0.7 |
182
  | **dim10** | **768β†’10β†’80 cls, 10β†’16β†’4 reg** | **13,426** | **13,426** | **1.5** | **5.6** | 0.4 |
183
  | **dim5** | **768β†’5β†’80 cls, 5β†’16β†’4 reg** | **9,101** | **9,101** | **0.3** | **1.3** | 0.1 |
 
186
 
187
  The dim15, dim10, and dim5 variants push the bottleneck further with the same SVD-initialization recipe applied to the top 15, top 10, and top 5 directions of the pruned R2 prototype matrix. Dim15 (17,751 parameters, 67% SVD energy retention) reaches 3.0 mAP. Dim10 (13,426 parameters, 61% energy retention) reaches 1.5 mAP β€” the smallest 80-class COCO detector to clear the 1.0 mAP threshold. Dim5 (9,101 parameters, 53% energy retention) drops to 0.3 mAP. The mAP scaling across dim20 β†’ dim15 β†’ dim10 is roughly geometric (3.9 β†’ 3.0 β†’ 1.5), but reverses sharply between 10 and 5 dimensions where the curve falls off a cliff. Five directions sit below the intrinsic capacity needed for 80-class separation; the floor lies between 5 and 10 bottleneck dimensions, and finer probes at dim7/dim8 would localize the exact cliff.
188
 
189
+ Dim20 was then itself put through magnitude pruning. The mAP-driven pruner bisects over the magnitude-sorted weight list of a chosen parameter, uses full pycocotools mAP@[0.5:0.95] as the retention metric (1000 val images), and rolls back any pass that fails the 95% retention floor on full verification. It was run separately on each learned parameter of dim20 plus a joint-magnitude variant that ranks every weight in the head against every other.
190
+
191
+ The leading Pareto point is **R1 (project layer 25.7% sparse, 18,121 nonzero, 3.9 mAP)** β€” same mAP as unpruned dim20 with 18% fewer effective parameters, the highest mAP-per-10K-parameter ratio in the table at 2.15. R2 pushes to 26.6% project sparsity (17,988 nonzero) at a small mAP cost (3.8). Per-parameter slack measurements:
192
+
193
+ - `project.weight`: 26.6% prunable (the sparsity that produced R1/R2)
194
+ - `cls_weight`: 37% prunable in isolation, 3.8 mAP at 21,480 nonzero
195
+ - `reg_hidden.weight`: 17% prunable, 3.8 mAP at 22,021 nonzero
196
+ - `reg_out.weight`: 12% prunable, 3.8 mAP at 22,068 nonzero
197
+ - `ctr_weight`: **90% prunable** (only 2 of 20 centerness weights load-bear), 3.7 mAP at 22,058 nonzero
198
+
199
+ Greedy stacking of cls_weight pruning on top of R1 reaches 17,406 nonzero but drops to 3.5 mAP β€” interaction between the parameters: the cls_weight slack measured against unpruned dim20 partly comes from compensating for the surviving project subspace, so removing it after pruning project costs more mAP than the per-parameter measurement suggested. Joint magnitude pruning across all 22K head weights (starting from R1) finds 17,129 nonzero at 3.6 mAP, which is the smallest dim20 found but does not Pareto-dominate R1 β€” the bisection's 1000-image mAP proxy was systematically optimistic relative to the full 5000-image eval, so the 95% retention floor measured during pruning gave more aggressive cuts than the full eval would have accepted. R1 remains the leading point of the dim20 pruning Pareto.
200
 
201
  #### Training recipe
202