AbstractPhil commited on
Commit
2c80736
·
verified ·
1 Parent(s): 86f39e0

Create full_with_bank_1m_samples_output.txt

Browse files
training_metrics/full_with_bank_1m_samples_output.txt ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ =================================================================
2
+ CO-TRAIN: Student + Alignment Bank (unfrozen)
3
+ =================================================================
4
+ Device: cuda
5
+ LR encoder: 0.0001 LR bank: 0.0005
6
+ Bank weight: 0.2
7
+
8
+ =================================================================
9
+ PHASE 0: LOAD CACHED EMBEDDINGS
10
+ =================================================================
11
+ bert: torch.Size([500000, 768])
12
+ modern: torch.Size([500000, 768])
13
+ roberta: torch.Size([500000, 768])
14
+ albert: torch.Size([500000, 768])
15
+ distil: torch.Size([500000, 768])
16
+ Captions: 500,000, using 500,000
17
+
18
+ =================================================================
19
+ PHASE 1: GPA ALIGNMENT
20
+ =================================================================
21
+ GPA iter 1: delta=1.99174462
22
+ GPA iter 5: delta=0.00009400
23
+ GPA iter 10: delta=0.00001988
24
+ GPA iter 15: delta=0.00000849
25
+ cos(consensus, bert): 0.9880
26
+ cos(consensus, modern): 0.9831
27
+ cos(consensus, roberta): 0.9885
28
+ cos(consensus, albert): 0.9864
29
+ cos(consensus, distil): 0.9909
30
+ Consensus CV: 0.2543
31
+
32
+ =================================================================
33
+ PHASE 2: LOAD MODEL (unfrozen)
34
+ =================================================================
35
+ Loading weights: 100%
36
+  112/112 [00:00<00:00, 3856.51it/s, Materializing param=token_emb.weight]
37
+ Encoder: 25,958,016 params
38
+ Bank: 6,466,944 params (present)
39
+ Total: 32,424,960 params (ALL unfrozen)
40
+
41
+ =================================================================
42
+ PHASE 3: TOKENIZE
43
+ =================================================================
44
+ Tokenizing 500,000 captions...
45
+ Train: 495,000 Val: 5,000
46
+
47
+ =================================================================
48
+ PHASE 4: CO-TRAIN (encoder + bank)
49
+ =================================================================
50
+ Tensorboard: runs/cotrain_20260313_072033
51
+ E 1/2: 100%|██████████| 3868/3868 [11:28<00:00, 5.62batch/s, bank=0.2956, cos=0.901, loss=0.0720]
52
+
53
+ E 1: 688s step=3868
54
+ Student: v_cos=0.8939±0.0407 v_acc=0.999 v_cv=0.2198 eff_dim=74.1
55
+ Losses: nce=0.0086 mse=0.0003 bank=0.2956
56
+ Bank: agr=0.000000 ortho=0.000002 entropy=2.8501 emb_cv=0.2118
57
+ exp_cos=0.535±0.001 disagree=0.000000 spread=0.01467
58
+ Context: geo_eff_dim=16.6 geo_cv=0.4483
59
+ ★ New best: v_cos=0.8939
60
+ E 2/2: 100%|██████████| 3868/3868 [11:29<00:00, 5.61batch/s, bank=0.3114, cos=0.895, loss=0.0817]
61
+
62
+ E 2: 689s step=7736
63
+ Student: v_cos=0.8917±0.0400 v_acc=0.999 v_cv=0.2086 eff_dim=73.7
64
+ Losses: nce=0.0118 mse=0.0003 bank=0.3114
65
+ Bank: agr=0.000000 ortho=0.000002 entropy=2.7060 emb_cv=0.1957
66
+ exp_cos=0.558±0.001 disagree=0.000000 spread=0.01583
67
+ Context: geo_eff_dim=15.8 geo_cv=0.5315
68
+
69
+ =================================================================
70
+ PHASE 5: VERIFICATION
71
+ =================================================================
72
+ Enriched: torch.Size([10, 896])
73
+ Geo: {'expert_cos_mean': 0.5349530577659607, 'expert_cos_std': 0.001167053822427988, 'cross_expert_cos': 0.045003507286310196, 'cross_expert_cos_std': 0.03178434446454048, 'anchor_max_cos': 0.7455679774284363, 'anchor_mean_cos': -0.04277874901890755, 'disagreement_ratio': 0.0006186707178130746, 'norm_ratio_spread': 0.4806589186191559}
74
+
75
+ Pairwise cosines:
76
+ [0]↔[1]: 0.788 (A cat sitting on a windowsill ↔ A dog playing in the park)
77
+ [0]↔[2]: 0.622 (A cat sitting on a windowsill ↔ A still life painting with flo)
78
+ [0]↔[3]: 0.741 (A cat sitting on a windowsill ↔ A child riding a bicycle)
79
+ [1]↔[2]: 0.582 (A dog playing in the park ↔ A still life painting with flo)
80
+ [1]↔[3]: 0.851 (A dog playing in the park ↔ A child riding a bicycle)
81
+ [2]↔[3]: 0.639 (A still life painting with flo ↔ A child riding a bicycle)
82
+
83
+ =================================================================
84
+ SUMMARY
85
+ =================================================================
86
+ Best v_cos: 0.8939
87
+ Final v_cv: 0.2029
88
+ Consensus CV: 0.2543
89
+ Val R@1: 0.999
90
+ Encoder LR: 0.0001
91
+ Bank LR: 0.0005
92
+ Bank weight: 0.2
93
+
94
+ Saved: cotrain_best.pt, cotrain_final.pt
95
+ Tensorboard: runs/cotrain_20260313_072033
96
+
97
+ =================================================================
98
+ DONE
99
+ =================================================================