Create full_with_bank_1m_samples_output.txt
Browse files
training_metrics/full_with_bank_1m_samples_output.txt
ADDED
|
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
=================================================================
|
| 2 |
+
CO-TRAIN: Student + Alignment Bank (unfrozen)
|
| 3 |
+
=================================================================
|
| 4 |
+
Device: cuda
|
| 5 |
+
LR encoder: 0.0001 LR bank: 0.0005
|
| 6 |
+
Bank weight: 0.2
|
| 7 |
+
|
| 8 |
+
=================================================================
|
| 9 |
+
PHASE 0: LOAD CACHED EMBEDDINGS
|
| 10 |
+
=================================================================
|
| 11 |
+
bert: torch.Size([500000, 768])
|
| 12 |
+
modern: torch.Size([500000, 768])
|
| 13 |
+
roberta: torch.Size([500000, 768])
|
| 14 |
+
albert: torch.Size([500000, 768])
|
| 15 |
+
distil: torch.Size([500000, 768])
|
| 16 |
+
Captions: 500,000, using 500,000
|
| 17 |
+
|
| 18 |
+
=================================================================
|
| 19 |
+
PHASE 1: GPA ALIGNMENT
|
| 20 |
+
=================================================================
|
| 21 |
+
GPA iter 1: delta=1.99174462
|
| 22 |
+
GPA iter 5: delta=0.00009400
|
| 23 |
+
GPA iter 10: delta=0.00001988
|
| 24 |
+
GPA iter 15: delta=0.00000849
|
| 25 |
+
cos(consensus, bert): 0.9880
|
| 26 |
+
cos(consensus, modern): 0.9831
|
| 27 |
+
cos(consensus, roberta): 0.9885
|
| 28 |
+
cos(consensus, albert): 0.9864
|
| 29 |
+
cos(consensus, distil): 0.9909
|
| 30 |
+
Consensus CV: 0.2543
|
| 31 |
+
|
| 32 |
+
=================================================================
|
| 33 |
+
PHASE 2: LOAD MODEL (unfrozen)
|
| 34 |
+
=================================================================
|
| 35 |
+
Loading weights: 100%
|
| 36 |
+
112/112 [00:00<00:00, 3856.51it/s, Materializing param=token_emb.weight]
|
| 37 |
+
Encoder: 25,958,016 params
|
| 38 |
+
Bank: 6,466,944 params (present)
|
| 39 |
+
Total: 32,424,960 params (ALL unfrozen)
|
| 40 |
+
|
| 41 |
+
=================================================================
|
| 42 |
+
PHASE 3: TOKENIZE
|
| 43 |
+
=================================================================
|
| 44 |
+
Tokenizing 500,000 captions...
|
| 45 |
+
Train: 495,000 Val: 5,000
|
| 46 |
+
|
| 47 |
+
=================================================================
|
| 48 |
+
PHASE 4: CO-TRAIN (encoder + bank)
|
| 49 |
+
=================================================================
|
| 50 |
+
Tensorboard: runs/cotrain_20260313_072033
|
| 51 |
+
E 1/2: 100%|██████████| 3868/3868 [11:28<00:00, 5.62batch/s, bank=0.2956, cos=0.901, loss=0.0720]
|
| 52 |
+
|
| 53 |
+
E 1: 688s step=3868
|
| 54 |
+
Student: v_cos=0.8939±0.0407 v_acc=0.999 v_cv=0.2198 eff_dim=74.1
|
| 55 |
+
Losses: nce=0.0086 mse=0.0003 bank=0.2956
|
| 56 |
+
Bank: agr=0.000000 ortho=0.000002 entropy=2.8501 emb_cv=0.2118
|
| 57 |
+
exp_cos=0.535±0.001 disagree=0.000000 spread=0.01467
|
| 58 |
+
Context: geo_eff_dim=16.6 geo_cv=0.4483
|
| 59 |
+
★ New best: v_cos=0.8939
|
| 60 |
+
E 2/2: 100%|██████████| 3868/3868 [11:29<00:00, 5.61batch/s, bank=0.3114, cos=0.895, loss=0.0817]
|
| 61 |
+
|
| 62 |
+
E 2: 689s step=7736
|
| 63 |
+
Student: v_cos=0.8917±0.0400 v_acc=0.999 v_cv=0.2086 eff_dim=73.7
|
| 64 |
+
Losses: nce=0.0118 mse=0.0003 bank=0.3114
|
| 65 |
+
Bank: agr=0.000000 ortho=0.000002 entropy=2.7060 emb_cv=0.1957
|
| 66 |
+
exp_cos=0.558±0.001 disagree=0.000000 spread=0.01583
|
| 67 |
+
Context: geo_eff_dim=15.8 geo_cv=0.5315
|
| 68 |
+
|
| 69 |
+
=================================================================
|
| 70 |
+
PHASE 5: VERIFICATION
|
| 71 |
+
=================================================================
|
| 72 |
+
Enriched: torch.Size([10, 896])
|
| 73 |
+
Geo: {'expert_cos_mean': 0.5349530577659607, 'expert_cos_std': 0.001167053822427988, 'cross_expert_cos': 0.045003507286310196, 'cross_expert_cos_std': 0.03178434446454048, 'anchor_max_cos': 0.7455679774284363, 'anchor_mean_cos': -0.04277874901890755, 'disagreement_ratio': 0.0006186707178130746, 'norm_ratio_spread': 0.4806589186191559}
|
| 74 |
+
|
| 75 |
+
Pairwise cosines:
|
| 76 |
+
[0]↔[1]: 0.788 (A cat sitting on a windowsill ↔ A dog playing in the park)
|
| 77 |
+
[0]↔[2]: 0.622 (A cat sitting on a windowsill ↔ A still life painting with flo)
|
| 78 |
+
[0]↔[3]: 0.741 (A cat sitting on a windowsill ↔ A child riding a bicycle)
|
| 79 |
+
[1]↔[2]: 0.582 (A dog playing in the park ↔ A still life painting with flo)
|
| 80 |
+
[1]↔[3]: 0.851 (A dog playing in the park ↔ A child riding a bicycle)
|
| 81 |
+
[2]↔[3]: 0.639 (A still life painting with flo ↔ A child riding a bicycle)
|
| 82 |
+
|
| 83 |
+
=================================================================
|
| 84 |
+
SUMMARY
|
| 85 |
+
=================================================================
|
| 86 |
+
Best v_cos: 0.8939
|
| 87 |
+
Final v_cv: 0.2029
|
| 88 |
+
Consensus CV: 0.2543
|
| 89 |
+
Val R@1: 0.999
|
| 90 |
+
Encoder LR: 0.0001
|
| 91 |
+
Bank LR: 0.0005
|
| 92 |
+
Bank weight: 0.2
|
| 93 |
+
|
| 94 |
+
Saved: cotrain_best.pt, cotrain_final.pt
|
| 95 |
+
Tensorboard: runs/cotrain_20260313_072033
|
| 96 |
+
|
| 97 |
+
=================================================================
|
| 98 |
+
DONE
|
| 99 |
+
=================================================================
|