Create geolip_loss_profiler_output.txt
Browse files
geolip_loss_profiler_output.txt
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Building profile model...
|
| 2 |
+
Device: cuda
|
| 3 |
+
Batch: 256, Dim: 256, Anchors: 256, Comp: 8Γ64
|
| 4 |
+
Parameters: 4,334,244
|
| 5 |
+
|
| 6 |
+
================================================================================
|
| 7 |
+
SECTION 1: FORWARD PASS COMPONENTS
|
| 8 |
+
================================================================================
|
| 9 |
+
|
| 10 |
+
|
| 11 |
+
================================================================================
|
| 12 |
+
SECTION 2: INDIVIDUAL LOSS TERMS (forward only)
|
| 13 |
+
================================================================================
|
| 14 |
+
|
| 15 |
+
|
| 16 |
+
================================================================================
|
| 17 |
+
SECTION 3: CV LOSS β OLD SEQUENTIAL vs BATCHED
|
| 18 |
+
================================================================================
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
================================================================================
|
| 22 |
+
SECTION 4: BACKWARD COSTS (forward + backward)
|
| 23 |
+
================================================================================
|
| 24 |
+
|
| 25 |
+
|
| 26 |
+
|
| 27 |
+
================================================================================
|
| 28 |
+
FULL TIMING REPORT (sorted by cost)
|
| 29 |
+
================================================================================
|
| 30 |
+
|
| 31 |
+
CV metric OLD n=200 117.339ms ββββββββββββββββββββββββββββββββββββββββ 17.3%
|
| 32 |
+
CV OLD n=200 117.218ms ββββββββββββββββββββββββββββββββββββββββ 17.3%
|
| 33 |
+
CV OLD n=128 75.116ms ββββββββββββββββββββββββββββββββββββββββ 11.1%
|
| 34 |
+
fwd+bwd NCE_pw 71.622ms ββββββββββββββββββββββββββββββββββββββββ 10.5%
|
| 35 |
+
fwd+bwd NCE_emb 70.474ms ββββββββββββββββββββββββββββββββββββββββ 10.4%
|
| 36 |
+
fwd+bwd CV old 48.412ms ββββββββββββββββββββββββββββββββββββββββ 7.1%
|
| 37 |
+
CV OLD n=64 36.944ms ββββββββββββββββββββββββββββββββββββββββ 5.4%
|
| 38 |
+
fwd+bwd CE 35.398ms ββββββββββββββββββββββββββββββββββββββββ 5.2%
|
| 39 |
+
fwd+bwd Bridge 35.372ms ββββββββββββββββββββββββββββββββββββββββ 5.2%
|
| 40 |
+
FULL forward (both views) 22.355ms ββββββββββββββββββββββββββββββββββββββββ 3.3%
|
| 41 |
+
CV OLD n=32 20.031ms ββββββββββββββββββββββββββββββββββββββββ 2.9%
|
| 42 |
+
fwd+bwd CV batch 11.265ms ββββββββββββββββββββββββββββββββββββββββ 1.7%
|
| 43 |
+
encoder(v1) 10.891ms ββββββββββββββββββββββββββββββββββββββββ 1.6%
|
| 44 |
+
patchwork(tri) 1.022ms ββββββββββββββββββββββββββββββββββββββββ 0.2%
|
| 45 |
+
CV BATCH n=128 0.847ms ββββββββββββββββββββββββββββββββββββββββ 0.1%
|
| 46 |
+
CV metric BATCH n=200 0.830ms ββββββββββββββββββββββββββββββββββββββββ 0.1%
|
| 47 |
+
CV BATCH n=32 0.818ms ββββββββββββββββββββββββββββββββββββββββ 0.1%
|
| 48 |
+
CV BATCH n=200 0.814ms ββββββββββββββββββββββββββββββββββββββββ 0.1%
|
| 49 |
+
CV BATCH n=64 0.811ms ββββββββββββββββββββββββββββββββββββββββ 0.1%
|
| 50 |
+
Spread (AΓA + relu) 0.365ms ββββββββββββββββββββββββββββββββββββββββ 0.1%
|
| 51 |
+
NCE_pw (norm + BΓB + CE) 0.245ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 52 |
+
task_head(feat) 0.243ms βββββββββββββββββββββββββββββοΏ½οΏ½οΏ½ββββββββββ 0.0%
|
| 53 |
+
NCE_tri (norm + BΓB + CE) 0.240ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 54 |
+
kNN (BΓB + argmax) 0.170ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 55 |
+
Assign BCE 0.139ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 56 |
+
NCE_assign (BΓB + CE) 0.102ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 57 |
+
NCE_emb (BΓB + CE) 0.099ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 58 |
+
Bridge (soft CE) 0.082ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 59 |
+
Attraction (max + mean) 0.068ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 60 |
+
bridge(pw) 0.058ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 61 |
+
triangulation (emb@A.T) 0.040ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 62 |
+
soft_assign (softmax) 0.037ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 63 |
+
CE (cross_entropy) 0.033ms ββββββββββββββββββββββββββββββββββββββββ 0.0%
|
| 64 |
+
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
|
| 65 |
+
SUM 679.501ms
|
| 66 |
+
|
| 67 |
+
================================================================================
|
| 68 |
+
CV SPEEDUP SUMMARY
|
| 69 |
+
================================================================================
|
| 70 |
+
n= 128: 75.12ms β 0.85ms (88.6x speedup)
|
| 71 |
+
n= 200: 117.34ms β 0.83ms (141.3x speedup)
|
| 72 |
+
n= 32: 20.03ms β 0.82ms (24.5x speedup)
|
| 73 |
+
n= 64: 36.94ms β 0.81ms (45.6x speedup)
|
| 74 |
+
|
| 75 |
+
================================================================================
|
| 76 |
+
PER-STEP ESTIMATE
|
| 77 |
+
================================================================================
|
| 78 |
+
Forward (both views): 22.35ms
|
| 79 |
+
fwd+bwd CE: 35.40ms
|
| 80 |
+
fwd+bwd CV (old): 48.41ms
|
| 81 |
+
fwd+bwd CV (batched): 11.26ms
|
| 82 |
+
CV savings per step: 37.15ms (77%)
|
| 83 |
+
|
| 84 |
+
================================================================================
|
| 85 |
+
PROFILING COMPLETE
|
| 86 |
+
================================================================================
|