AbstractPhil commited on
Commit
cf9a64f
Β·
verified Β·
1 Parent(s): 63c4951

Create run_2_train_vit_with_soup_output.txt

Browse files
Files changed (1) hide show
  1. run_2_train_vit_with_soup_output.txt +194 -0
run_2_train_vit_with_soup_output.txt ADDED
@@ -0,0 +1,194 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ =================================================================
2
+ GEOLIP VISION ENCODER β€” FROM SCRATCH
3
+ ViT: 6L/384d/6h, patch16
4
+ 196 patches + CLS β†’ 128-d output
5
+ Device: cuda
6
+ =================================================================
7
+
8
+ Loading soup...
9
+ Soup: mAP=0.837 CV_target=0.2731
10
+ train: loaded cached targets (118,287)
11
+ val: loaded cached targets (5,000)
12
+ Caching train images (118,287)...
13
+ Resolving data files: 100%
14
+  39/39 [00:00<00:00, 5057.75it/s]
15
+ Downloading data: 100%
16
+  39/39 [04:55<00:00,  7.45s/files]
17
+ default/train/0002.parquet: 100%
18
+  509M/509M [00:09<00:00, 69.4MB/s]
19
+ default/train/0003.parquet: 100%
20
+  502M/502M [00:03<00:00, 298MB/s]
21
+ default/train/0004.parquet: 100%
22
+  507M/507M [00:10<00:00, 88.0MB/s]
23
+ default/train/0005.parquet: 100%
24
+  499M/499M [00:04<00:00, 95.4MB/s]
25
+ default/train/0006.parquet: 100%
26
+  510M/510M [00:09<00:00, 73.4MB/s]
27
+ default/train/0007.parquet: 100%
28
+  502M/502M [00:06<00:00, 47.9MB/s]
29
+ default/train/0008.parquet: 100%
30
+  514M/514M [00:09<00:00, 90.8MB/s]
31
+ default/train/0009.parquet: 100%
32
+  509M/509M [00:06<00:00, 111MB/s]
33
+ default/train/0010.parquet: 100%
34
+  509M/509M [00:07<00:00, 89.7MB/s]
35
+ default/train/0011.parquet: 100%
36
+  505M/505M [00:05<00:00, 70.6MB/s]
37
+ default/train/0012.parquet: 100%
38
+  507M/507M [00:06<00:00, 87.5MB/s]
39
+ default/train/0013.parquet: 100%
40
+  502M/502M [00:09<00:00, 59.5MB/s]
41
+ default/train/0014.parquet: 100%
42
+  504M/504M [00:09<00:00, 70.8MB/s]
43
+ default/train/0015.parquet: 100%
44
+  514M/514M [00:07<00:00, 122MB/s]
45
+ default/train/0016.parquet: 100%
46
+  507M/507M [00:07<00:00, 95.1MB/s]
47
+ default/train/0017.parquet: 100%
48
+  509M/509M [00:09<00:00, 89.6MB/s]
49
+ default/train/0018.parquet: 100%
50
+  504M/504M [00:06<00:00, 63.2MB/s]
51
+ default/train/0019.parquet: 100%
52
+  511M/511M [00:10<00:00, 83.7MB/s]
53
+ default/train/0020.parquet: 100%
54
+  510M/510M [00:10<00:00, 72.5MB/s]
55
+ default/train/0021.parquet: 100%
56
+  504M/504M [00:09<00:00, 77.3MB/s]
57
+ default/train/0022.parquet: 100%
58
+  507M/507M [00:10<00:00, 89.6MB/s]
59
+ default/train/0023.parquet: 100%
60
+  511M/511M [00:10<00:00, 65.3MB/s]
61
+ default/train/0024.parquet: 100%
62
+  505M/505M [00:09<00:00, 78.0MB/s]
63
+ default/train/0025.parquet: 100%
64
+  503M/503M [00:04<00:00, 196MB/s]
65
+ default/train/0026.parquet: 100%
66
+  508M/508M [00:05<00:00, 121MB/s]
67
+ default/train/0027.parquet: 100%
68
+  508M/508M [00:06<00:00, 93.1MB/s]
69
+ default/train/0028.parquet: 100%
70
+  507M/507M [00:05<00:00, 122MB/s]
71
+ default/train/0029.parquet: 100%
72
+  510M/510M [00:07<00:00, 75.8MB/s]
73
+ default/train/0030.parquet: 100%
74
+  505M/505M [00:08<00:00, 71.4MB/s]
75
+ default/train/0031.parquet: 100%
76
+  502M/502M [00:04<00:00, 168MB/s]
77
+ default/train/0032.parquet: 100%
78
+  502M/502M [00:02<00:00, 321MB/s]
79
+ default/train/0033.parquet: 100%
80
+  508M/508M [00:07<00:00, 86.3MB/s]
81
+ default/train/0034.parquet: 100%
82
+  504M/504M [00:07<00:00, 78.1MB/s]
83
+ default/train/0035.parquet: 100%
84
+  499M/499M [00:16<00:00, 101MB/s]
85
+ default/train/0036.parquet: 100%
86
+  507M/507M [00:10<00:00, 78.6MB/s]
87
+ default/train/0037.parquet: 100%
88
+  501M/501M [00:09<00:00, 106MB/s]
89
+ default/train/0038.parquet: 100%
90
+  79.2M/79.2M [00:01<00:00, 173MB/s]
91
+ default/val/0000.parquet: 100%
92
+  504M/504M [00:04<00:00, 128MB/s]
93
+ default/val/0001.parquet: 100%
94
+  311M/311M [00:03<00:00, 165MB/s]
95
+ Generating train split: 
96
+  118287/0 [01:49<00:00, 1378.35 examples/s]
97
+ Generating validation split: 
98
+  5000/0 [00:05<00:00, 617.41 examples/s]
99
+ Loading dataset shards: 100%
100
+  39/39 [00:05<00:00,  8.83it/s]
101
+ Caching train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 118287/118287 [13:03<00:00, 151.05it/s]
102
+ Cached 118287/118287 images
103
+ Saved: cached_train_images.pt (35611 MB)
104
+ Caching val images (5,000)...
105
+ Resolving data files: 100%
106
+  39/39 [00:00<00:00, 4857.40it/s]
107
+ Caching val: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5000/5000 [00:33<00:00, 148.88it/s]
108
+ Cached 5000/5000 images
109
+ Saved: cached_val_images.pt (1505 MB)
110
+
111
+ =================================================================
112
+ BUILD ENCODER
113
+ =================================================================
114
+ Architecture: 6L/384d/6h, patch16
115
+ Input: 224Γ—224 β†’ 196 patches
116
+ Output: 128-d (on hypersphere)
117
+ Parameters: 11,216,768
118
+
119
+ =================================================================
120
+ TRAINING
121
+ 20 epochs, lr=0.0003, batch=48
122
+ Losses: InfoNCE + MSE + CV + BCE + Procrustes alignment
123
+ CV target: 0.2731
124
+ Images: train=118,287 val=5,000 (cached as tensors)
125
+ =================================================================
126
+ E 1/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:44<00:00, 14.97batch/s, cos=0.258, loss=2.6911, nce_acc=0.339, ordered=1]
127
+ E1 train: 165s loss=2.6891 nce=2.2529 mse=0.0120 bce=0.1963 nce_acc=0.340
128
+ E1 val: mAP=0.151 F1=0.162 R@1=0.032 cos=0.325 cv=0.2663 anchors=95/256 seen=5000/5000 β˜…
129
+ E 2/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.32batch/s, cos=0.368, loss=1.7954, nce_acc=0.553, ordered=1]
130
+ E2 train: 161s loss=1.7948 nce=1.4297 mse=0.0099 bce=0.1473 nce_acc=0.553
131
+ E2 val: mAP=0.206 F1=0.197 R@1=0.062 cos=0.390 cv=0.2552 anchors=99/256 seen=5000/5000 β˜…
132
+ E 3/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.37batch/s, cos=0.416, loss=1.4860, nce_acc=0.641, ordered=1]
133
+ E3 train: 160s loss=1.4854 nce=1.1484 mse=0.0092 bce=0.1338 nce_acc=0.641
134
+ E3 val: mAP=0.246 F1=0.244 R@1=0.091 cos=0.427 cv=0.2234 anchors=98/256 seen=5000/5000 β˜…
135
+ E 4/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.448, loss=1.2913, nce_acc=0.695, ordered=1]
136
+ E4 train: 160s loss=1.2910 nce=0.9727 mse=0.0087 bce=0.1265 nce_acc=0.695
137
+ E4 val: mAP=0.272 F1=0.266 R@1=0.113 cos=0.453 cv=0.2078 anchors=99/256 seen=5000/5000 β˜…
138
+ E 5/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:40<00:00, 15.40batch/s, cos=0.475, loss=1.1334, nce_acc=0.743, ordered=1]
139
+ E5 train: 160s loss=1.1331 nce=0.8303 mse=0.0083 bce=0.1205 nce_acc=0.743
140
+ E5 val: mAP=0.296 F1=0.292 R@1=0.139 cos=0.473 cv=0.2133 anchors=98/256 seen=5000/5000 β˜…
141
+ E 6/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.499, loss=1.0005, nce_acc=0.784, ordered=1]
142
+ E6 train: 158s loss=1.0003 nce=0.7111 mse=0.0079 bce=0.1158 nce_acc=0.784
143
+ E6 val: mAP=0.317 F1=0.311 R@1=0.164 cos=0.495 cv=0.1835 anchors=98/256 seen=5000/5000 β˜…
144
+ E 7/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.60batch/s, cos=0.520, loss=0.8947, nce_acc=0.815, ordered=1]
145
+ E7 train: 158s loss=0.8943 nce=0.6172 mse=0.0075 bce=0.1115 nce_acc=0.815
146
+ E7 val: mAP=0.337 F1=0.335 R@1=0.190 cos=0.513 cv=0.1809 anchors=96/256 seen=5000/5000 β˜…
147
+ E 8/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.539, loss=0.8030, nce_acc=0.842, ordered=1]
148
+ E8 train: 158s loss=0.8028 nce=0.5365 mse=0.0072 bce=0.1076 nce_acc=0.843
149
+ E8 val: mAP=0.344 F1=0.331 R@1=0.207 cos=0.523 cv=0.1779 anchors=95/256 seen=5000/5000 β˜…
150
+ E 9/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.58batch/s, cos=0.557, loss=0.7229, nce_acc=0.866, ordered=1]
151
+ E9 train: 158s loss=0.7228 nce=0.4665 mse=0.0070 bce=0.1041 nce_acc=0.866
152
+ E9 val: mAP=0.361 F1=0.349 R@1=0.218 cos=0.537 cv=0.1764 anchors=95/256 seen=5000/5000 β˜…
153
+ E10/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.51batch/s, cos=0.574, loss=0.6538, nce_acc=0.887, ordered=1]
154
+ E10 train: 159s loss=0.6538 nce=0.4070 mse=0.0067 bce=0.1009 nce_acc=0.887
155
+ E10 val: mAP=0.380 F1=0.361 R@1=0.254 cos=0.557 cv=0.1699 anchors=96/256 seen=5000/5000 β˜…
156
+ E11/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.54batch/s, cos=0.589, loss=0.5929, nce_acc=0.905, ordered=1]
157
+ E11 train: 159s loss=0.5928 nce=0.3545 mse=0.0065 bce=0.0978 nce_acc=0.905
158
+ E11 val: mAP=0.387 F1=0.377 R@1=0.265 cos=0.564 cv=0.1497 anchors=95/256 seen=5000/5000 β˜…
159
+ E12/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.55batch/s, cos=0.604, loss=0.5372, nce_acc=0.920, ordered=1]
160
+ E12 train: 158s loss=0.5372 nce=0.3073 mse=0.0062 bce=0.0948 nce_acc=0.920
161
+ E12 val: mAP=0.400 F1=0.382 R@1=0.276 cos=0.573 cv=0.1639 anchors=95/256 seen=5000/5000 β˜…
162
+ E13/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.60batch/s, cos=0.617, loss=0.4917, nce_acc=0.933, ordered=1]
163
+ E13 train: 158s loss=0.4917 nce=0.2693 mse=0.0060 bce=0.0920 nce_acc=0.933
164
+ E13 val: mAP=0.408 F1=0.392 R@1=0.291 cos=0.582 cv=0.1615 anchors=95/256 seen=5000/5000 β˜…
165
+ E14/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.629, loss=0.4502, nce_acc=0.945, ordered=1]
166
+ E14 train: 158s loss=0.4501 nce=0.2347 mse=0.0058 bce=0.0895 nce_acc=0.945
167
+ E14 val: mAP=0.413 F1=0.403 R@1=0.304 cos=0.586 cv=0.1594 anchors=95/256 seen=5000/5000 β˜…
168
+ E15/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.63batch/s, cos=0.640, loss=0.4169, nce_acc=0.954, ordered=1]
169
+ E15 train: 158s loss=0.4168 nce=0.2075 mse=0.0057 bce=0.0873 nce_acc=0.954
170
+ E15 val: mAP=0.418 F1=0.403 R@1=0.307 cos=0.591 cv=0.1607 anchors=94/256 seen=5000/5000 β˜…
171
+ E16/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.62batch/s, cos=0.649, loss=0.3909, nce_acc=0.961, ordered=1]
172
+ E16 train: 158s loss=0.3908 nce=0.1866 mse=0.0055 bce=0.0854 nce_acc=0.961
173
+ E16 val: mAP=0.422 F1=0.411 R@1=0.321 cos=0.595 cv=0.1495 anchors=95/256 seen=5000/5000 β˜…
174
+ E17/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:37<00:00, 15.61batch/s, cos=0.656, loss=0.3717, nce_acc=0.966, ordered=1]
175
+ E17 train: 158s loss=0.3716 nce=0.1715 mse=0.0054 bce=0.0838 nce_acc=0.966
176
+ E17 val: mAP=0.426 F1=0.417 R@1=0.321 cos=0.597 cv=0.1420 anchors=94/256 seen=5000/5000 β˜…
177
+ E18/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:39<00:00, 15.43batch/s, cos=0.661, loss=0.3579, nce_acc=0.969, ordered=1]
178
+ E18 train: 160s loss=0.3579 nce=0.1607 mse=0.0053 bce=0.0826 nce_acc=0.969
179
+ E18 val: mAP=0.429 F1=0.416 R@1=0.325 cos=0.599 cv=0.1375 anchors=94/256 seen=5000/5000 β˜…
180
+ E19/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:38<00:00, 15.59batch/s, cos=0.664, loss=0.3494, nce_acc=0.971, ordered=1]
181
+ E19 train: 158s loss=0.3494 nce=0.1539 mse=0.0053 bce=0.0820 nce_acc=0.971
182
+ E19 val: mAP=0.429 F1=0.420 R@1=0.325 cos=0.600 cv=0.1426 anchors=94/256 seen=5000/5000 β˜…
183
+ E20/20 train: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2465/2465 [02:36<00:00, 15.77batch/s, cos=0.665, loss=0.3456, nce_acc=0.972, ordered=1]
184
+ E20 train: 156s loss=0.3455 nce=0.1510 mse=0.0052 bce=0.0816 nce_acc=0.972
185
+ E20 val: mAP=0.429 F1=0.418 R@1=0.323 cos=0.599 cv=0.1570 anchors=94/256 seen=5000/5000
186
+
187
+ Best mAP: 0.429
188
+ Encoder: 11,216,768 params (from scratch)
189
+ Checkpoints saved every epoch in checkpoints/
190
+ Tensorboard: runs/geolip_vit_encoder
191
+
192
+ =================================================================
193
+ DONE
194
+ =================================================================