PuxAI commited on
Commit
72ad412
·
verified ·
1 Parent(s): 9694891

Upload ablation summaries

Browse files
mbert_rtx6000_metrics/docs/ablation_results.csv ADDED
@@ -0,0 +1,86 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ source,task,model,strategy,seed,metric,score,eval_loss,train_loss,epoch,eval_samples,path
2
+ result_ablation_mbert,cola,mBERT,hf_sequence_classifier,44,eval_matthews_correlation,0.7800284888427224,0.2951465845108032,0.23531772815093427,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/hf_sequence_classifier/seed_44/all_results.json
3
+ result_ablation_mbert,cola,mBERT,hf_sequence_classifier,42,eval_matthews_correlation,0.7848625278330646,0.2610536515712738,0.27871723547994093,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/hf_sequence_classifier/seed_42/all_results.json
4
+ result_ablation_mbert,cola,mBERT,hf_sequence_classifier,43,eval_matthews_correlation,0.7734460694651355,0.31756097078323364,0.24823517306556914,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/hf_sequence_classifier/seed_43/all_results.json
5
+ result_ablation_mbert,cola,mBERT,cls,43,eval_matthews_correlation,0.7612536086874112,0.3098720908164978,0.247887350549467,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/cls/seed_43/all_results.json
6
+ result_ablation_mbert,cola,mBERT,cls,42,eval_matthews_correlation,0.7493368300485673,0.3190879821777344,0.23319570001499168,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/cls/seed_42/all_results.json
7
+ result_ablation_mbert,cola,mBERT,cls,44,eval_matthews_correlation,0.7800284946221248,0.32316330075263977,0.2231286038233581,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/cls/seed_44/all_results.json
8
+ result_ablation_mbert,cola,mBERT,mean,42,eval_matthews_correlation,0.7635290646756514,0.2967136800289154,0.2256952981487111,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/mean/seed_42/all_results.json
9
+ result_ablation_mbert,cola,mBERT,mean,43,eval_matthews_correlation,0.7664668619596934,0.33258184790611267,0.23003221512728786,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/mean/seed_43/all_results.json
10
+ result_ablation_mbert,cola,mBERT,mean,44,eval_matthews_correlation,0.7776844819969452,0.32358258962631226,0.21521307658438799,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/mean/seed_44/all_results.json
11
+ result_ablation_mbert,cola,mBERT,max,42,eval_matthews_correlation,0.7823569443931843,0.28716450929641724,0.227474433083774,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/max/seed_42/all_results.json
12
+ result_ablation_mbert,cola,mBERT,max,43,eval_matthews_correlation,0.7493680462422366,0.32399871945381165,0.23861054157633577,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/max/seed_43/all_results.json
13
+ result_ablation_mbert,cola,mBERT,max,44,eval_matthews_correlation,0.7731698517570625,0.32878631353378296,0.22196905217801172,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/max/seed_44/all_results.json
14
+ result_ablation_mbert,cola,mBERT,attention,42,eval_matthews_correlation,0.7706218489153343,0.30340734124183655,0.2244898515928613,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/attention/seed_42/all_results.json
15
+ result_ablation_mbert,cola,mBERT,attention,43,eval_matthews_correlation,0.7546498935382564,0.344061017036438,0.23256953179947268,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/attention/seed_43/all_results.json
16
+ result_ablation_mbert,cola,mBERT,attention,44,eval_matthews_correlation,0.777661541691299,0.3155398666858673,0.22533691994970736,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/attention/seed_44/all_results.json
17
+ result_ablation_mbert,cola,mBERT,mha_attention,42,eval_matthews_correlation,0.7589772289071167,0.30496320128440857,0.24128270149230957,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/mha_attention/seed_42/all_results.json
18
+ result_ablation_mbert,cola,mBERT,mha_attention,43,eval_matthews_correlation,0.768437400905162,0.30815455317497253,0.23670564372668276,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/mha_attention/seed_43/all_results.json
19
+ result_ablation_mbert,cola,mBERT,mha_attention,44,eval_matthews_correlation,0.7941238477427384,0.2813240885734558,0.22841538486090007,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/mha_attention/seed_44/all_results.json
20
+ result_ablation_mbert,cola,mBERT,multi_branch_average,42,eval_matthews_correlation,0.789889696708505,0.2748015820980072,0.22175285980466136,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/multi_branch_average/seed_42/all_results.json
21
+ result_ablation_mbert,cola,mBERT,multi_branch_average,43,eval_matthews_correlation,0.7445812636597251,0.3285018801689148,0.22625035480414024,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/multi_branch_average/seed_43/all_results.json
22
+ result_ablation_mbert,cola,mBERT,multi_branch_average,44,eval_matthews_correlation,0.7729583162617174,0.2956400215625763,0.22685242720378399,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/multi_branch_average/seed_44/all_results.json
23
+ result_ablation_mbert,cola,mBERT,gated_multi_branch,42,eval_matthews_correlation,0.7613241357624281,0.3221026062965393,0.2219125554104313,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/gated_multi_branch/seed_42/all_results.json
24
+ result_ablation_mbert,cola,mBERT,gated_multi_branch,43,eval_matthews_correlation,0.7660150387812079,0.3194178640842438,0.22317801773881113,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/gated_multi_branch/seed_43/all_results.json
25
+ result_ablation_mbert,cola,mBERT,gated_multi_branch,44,eval_matthews_correlation,0.780066626120282,0.30894502997398376,0.22649634483806247,3.0,1043,/workspace/result_ablation_mbert/cola/mBERT/gated_multi_branch/seed_44/all_results.json
26
+ result_ablation_mbert,mrpc,mBERT,hf_sequence_classifier,42,eval_combined_score,0.8382558330594807,0.37590235471725464,0.44336734757278906,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/hf_sequence_classifier/seed_42/all_results.json
27
+ result_ablation_mbert,mrpc,mBERT,hf_sequence_classifier,43,eval_combined_score,0.859977862112587,0.3466010093688965,0.41218561940378956,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/hf_sequence_classifier/seed_43/all_results.json
28
+ result_ablation_mbert,mrpc,mBERT,hf_sequence_classifier,44,eval_combined_score,0.8580065359477125,0.35651159286499023,0.4211170962362578,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/hf_sequence_classifier/seed_44/all_results.json
29
+ result_ablation_mbert,mrpc,mBERT,cls,42,eval_combined_score,0.8501131221719458,0.3741132616996765,0.3877318664030595,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/cls/seed_42/all_results.json
30
+ result_ablation_mbert,mrpc,mBERT,cls,43,eval_combined_score,0.8658320923306031,0.3552929759025574,0.38404927728496074,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/cls/seed_43/all_results.json
31
+ result_ablation_mbert,mrpc,mBERT,cls,44,eval_combined_score,0.8676559714795009,0.337443470954895,0.3921152353286743,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/cls/seed_44/all_results.json
32
+ result_ablation_mbert,mrpc,mBERT,mean,42,eval_combined_score,0.8617845786963434,0.3512035310268402,0.38461016886162036,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/mean/seed_42/all_results.json
33
+ result_ablation_mbert,mrpc,mBERT,mean,43,eval_combined_score,0.8723189636865576,0.35500794649124146,0.38332160159106893,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/mean/seed_43/all_results.json
34
+ result_ablation_mbert,mrpc,mBERT,mean,44,eval_combined_score,0.862203602371181,0.33879777789115906,0.3880394872648891,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/mean/seed_44/all_results.json
35
+ result_ablation_mbert,mrpc,mBERT,max,42,eval_combined_score,0.8570936087683713,0.3859576880931854,0.37803170020446114,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/max/seed_42/all_results.json
36
+ result_ablation_mbert,mrpc,mBERT,max,43,eval_combined_score,0.8717260626878489,0.3556552827358246,0.3737406565513446,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/max/seed_43/all_results.json
37
+ result_ablation_mbert,mrpc,mBERT,max,44,eval_combined_score,0.8811933375500738,0.3449813723564148,0.3843384521864193,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/max/seed_44/all_results.json
38
+ result_ablation_mbert,mrpc,mBERT,attention,42,eval_combined_score,0.8452440457399417,0.39012983441352844,0.38344859509241014,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/attention/seed_42/all_results.json
39
+ result_ablation_mbert,mrpc,mBERT,attention,43,eval_combined_score,0.8543956043956045,0.33410194516181946,0.39260981506083437,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/attention/seed_43/all_results.json
40
+ result_ablation_mbert,mrpc,mBERT,attention,44,eval_combined_score,0.8773201451399064,0.3298549950122833,0.373855785890059,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/attention/seed_44/all_results.json
41
+ result_ablation_mbert,mrpc,mBERT,mha_attention,42,eval_combined_score,0.8650315249774821,0.36945641040802,0.3834533773975455,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/mha_attention/seed_42/all_results.json
42
+ result_ablation_mbert,mrpc,mBERT,mha_attention,43,eval_combined_score,0.8674502647774438,0.3478466868400574,0.37287821604575944,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/mha_attention/seed_43/all_results.json
43
+ result_ablation_mbert,mrpc,mBERT,mha_attention,44,eval_combined_score,0.8767632952461559,0.34328851103782654,0.38678704608570447,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/mha_attention/seed_44/all_results.json
44
+ result_ablation_mbert,mrpc,mBERT,multi_branch_average,42,eval_combined_score,0.8714532109139952,0.3461981415748596,0.38643745581309,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/multi_branch_average/seed_42/all_results.json
45
+ result_ablation_mbert,mrpc,mBERT,multi_branch_average,43,eval_combined_score,0.8670419052576783,0.35154372453689575,0.4033502381601375,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/multi_branch_average/seed_43/all_results.json
46
+ result_ablation_mbert,mrpc,mBERT,multi_branch_average,44,eval_combined_score,0.878422920892495,0.33914050459861755,0.39075296472161364,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/multi_branch_average/seed_44/all_results.json
47
+ result_ablation_mbert,mrpc,mBERT,gated_multi_branch,42,eval_combined_score,0.859977862112587,0.3540915548801422,0.3987341035496105,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/gated_multi_branch/seed_42/all_results.json
48
+ result_ablation_mbert,mrpc,mBERT,gated_multi_branch,43,eval_combined_score,0.8565959952885749,0.3669602572917938,0.3874885788211575,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/gated_multi_branch/seed_43/all_results.json
49
+ result_ablation_mbert,mrpc,mBERT,gated_multi_branch,44,eval_combined_score,0.8787957074721782,0.34184572100639343,0.385329819344855,3.0,408,/workspace/result_ablation_mbert/mrpc/mBERT/gated_multi_branch/seed_44/all_results.json
50
+ result_ablation_mbert,sst2,mBERT,hf_sequence_classifier,42,eval_accuracy,0.8830275229357798,0.40195930004119873,0.2333266153747653,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/hf_sequence_classifier/seed_42/all_results.json
51
+ result_ablation_mbert,sst2,mBERT,hf_sequence_classifier,43,eval_accuracy,0.8864678899082569,0.4080387055873871,0.2339267143723179,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/hf_sequence_classifier/seed_43/all_results.json
52
+ result_ablation_mbert,sst2,mBERT,hf_sequence_classifier,44,eval_accuracy,0.8692660550458715,0.42636725306510925,0.23417754752909922,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/hf_sequence_classifier/seed_44/all_results.json
53
+ result_ablation_mbert,sst2,mBERT,cls,42,eval_accuracy,0.8761467889908257,0.4319080710411072,0.23663618221015786,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/cls/seed_42/all_results.json
54
+ result_ablation_mbert,sst2,mBERT,cls,43,eval_accuracy,0.8830275229357798,0.4112645387649536,0.23413483488593687,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/cls/seed_43/all_results.json
55
+ result_ablation_mbert,sst2,mBERT,cls,44,eval_accuracy,0.8727064220183486,0.42197221517562866,0.23428635476095158,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/cls/seed_44/all_results.json
56
+ result_ablation_mbert,sst2,mBERT,mean,42,eval_accuracy,0.8692660550458715,0.4306594431400299,0.23150781453618072,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/mean/seed_42/all_results.json
57
+ result_ablation_mbert,sst2,mBERT,mean,43,eval_accuracy,0.8772935779816514,0.4267817735671997,0.23140086427486634,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/mean/seed_43/all_results.json
58
+ result_ablation_mbert,sst2,mBERT,mean,44,eval_accuracy,0.8727064220183486,0.4250437319278717,0.23116512564059563,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/mean/seed_44/all_results.json
59
+ result_ablation_mbert,sst2,mBERT,max,42,eval_accuracy,0.8715596330275229,0.44055378437042236,0.23420973027307196,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/max/seed_42/all_results.json
60
+ result_ablation_mbert,sst2,mBERT,max,43,eval_accuracy,0.8784403669724771,0.45231905579566956,0.2331274027837987,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/max/seed_43/all_results.json
61
+ result_ablation_mbert,sst2,mBERT,max,44,eval_accuracy,0.8784403669724771,0.42756688594818115,0.23267899721096724,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/max/seed_44/all_results.json
62
+ result_ablation_mbert,sst2,mBERT,attention,42,eval_accuracy,0.8738532110091743,0.42198362946510315,0.2329871662718165,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/attention/seed_42/all_results.json
63
+ result_ablation_mbert,sst2,mBERT,attention,43,eval_accuracy,0.8876146788990825,0.37215080857276917,0.2262992448503362,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/attention/seed_43/all_results.json
64
+ result_ablation_mbert,sst2,mBERT,attention,44,eval_accuracy,0.8681192660550459,0.4196762442588806,0.22888659196927902,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/attention/seed_44/all_results.json
65
+ result_ablation_mbert,sst2,mBERT,mha_attention,42,eval_accuracy,0.8727064220183486,0.4217231869697571,0.23271927806386922,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/mha_attention/seed_42/all_results.json
66
+ result_ablation_mbert,sst2,mBERT,mha_attention,43,eval_accuracy,0.8818807339449541,0.4068981409072876,0.23306594122508992,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/mha_attention/seed_43/all_results.json
67
+ result_ablation_mbert,sst2,mBERT,mha_attention,44,eval_accuracy,0.8853211009174312,0.4112522304058075,0.2333670342296843,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/mha_attention/seed_44/all_results.json
68
+ result_ablation_mbert,sst2,mBERT,multi_branch_average,42,eval_accuracy,0.8853211009174312,0.4263913333415985,0.2324217217996476,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/multi_branch_average/seed_42/all_results.json
69
+ result_ablation_mbert,sst2,mBERT,multi_branch_average,43,eval_accuracy,0.8784403669724771,0.41880160570144653,0.23121720937355744,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/multi_branch_average/seed_43/all_results.json
70
+ result_ablation_mbert,sst2,mBERT,gated_multi_branch,43,eval_accuracy,0.8784403669724771,0.43847355246543884,0.23293366598041423,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/gated_multi_branch/seed_43/all_results.json
71
+ result_ablation_mbert,sst2,mBERT,gated_multi_branch,44,eval_accuracy,0.8876146788990825,0.41072866320610046,0.2325162212280353,3.0,872,/workspace/result_ablation_mbert/sst2/mBERT/gated_multi_branch/seed_44/all_results.json
72
+ result_ablation_mbert,vsfc,mBERT,hf_sequence_classifier,42,eval_accuracy,0.9393556538218573,0.22358696162700653,0.20489713470639875,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/hf_sequence_classifier/seed_42/all_results.json
73
+ result_ablation_mbert,vsfc,mBERT,hf_sequence_classifier,43,eval_accuracy,0.9374605180037903,0.22951717674732208,0.21280724117779168,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/hf_sequence_classifier/seed_43/all_results.json
74
+ result_ablation_mbert,vsfc,mBERT,hf_sequence_classifier,44,eval_accuracy,0.932406822488945,0.23924019932746887,0.21428075062846205,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/hf_sequence_classifier/seed_44/all_results.json
75
+ result_ablation_mbert,vsfc,mBERT,cls,42,eval_accuracy,0.9412507896399241,0.21872149407863617,0.2038555185166363,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/cls/seed_42/all_results.json
76
+ result_ablation_mbert,vsfc,mBERT,cls,43,eval_accuracy,0.9336702463676564,0.24253493547439575,0.2038297117837137,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/cls/seed_43/all_results.json
77
+ result_ablation_mbert,vsfc,mBERT,cls,44,eval_accuracy,0.9368288060644346,0.22460007667541504,0.20489413286020233,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/cls/seed_44/all_results.json
78
+ result_ablation_mbert,vsfc,mBERT,mean,44,eval_accuracy,0.936197094125079,0.22665680944919586,0.20191558268612234,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/mean/seed_44/all_results.json
79
+ result_ablation_mbert,vsfc,mBERT,max,43,eval_accuracy,0.9317751105495894,0.23970364034175873,0.20938005839764157,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/max/seed_43/all_results.json
80
+ result_ablation_mbert,vsfc,mBERT,max,44,eval_accuracy,0.9330385344283006,0.2280229926109314,0.20747514532533484,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/max/seed_44/all_results.json
81
+ result_ablation_mbert,vsfc,mBERT,attention,44,eval_accuracy,0.9330385344283006,0.2232467234134674,0.2006188094034022,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/attention/seed_44/all_results.json
82
+ result_ablation_mbert,vsfc,mBERT,mha_attention,43,eval_accuracy,0.9387239418825016,0.2242736667394638,0.20716787132757977,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/mha_attention/seed_43/all_results.json
83
+ result_ablation_mbert,vsfc,mBERT,multi_branch_average,42,eval_accuracy,0.9412507896399241,0.21253333985805511,0.20452363604115975,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/multi_branch_average/seed_42/all_results.json
84
+ result_ablation_mbert,vsfc,mBERT,multi_branch_average,43,eval_accuracy,0.9330385344283006,0.22735266387462616,0.20476110576752007,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/multi_branch_average/seed_43/all_results.json
85
+ result_ablation_mbert,vsfc,mBERT,gated_multi_branch,42,eval_accuracy,0.9380922299431459,0.2133793830871582,0.2020011867605659,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/gated_multi_branch/seed_42/all_results.json
86
+ result_ablation_mbert,vsfc,mBERT,gated_multi_branch,44,eval_accuracy,0.9387239418825016,0.22616708278656006,0.20627780971500498,3.0,1583,/workspace/result_ablation_mbert/vsfc/mBERT/gated_multi_branch/seed_44/all_results.json
mbert_rtx6000_metrics/docs/ablation_results_aggregate.csv ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ source,task,model,strategy,metric,n,mean,std,min,max
2
+ result_ablation_mbert,cola,mBERT,attention,eval_matthews_correlation,3,0.7676444280482966,0.011791215541430011,0.7546498935382564,0.777661541691299
3
+ result_ablation_mbert,cola,mBERT,cls,eval_matthews_correlation,3,0.7635396444527011,0.015473009992049978,0.7493368300485673,0.7800284946221248
4
+ result_ablation_mbert,cola,mBERT,gated_multi_branch,eval_matthews_correlation,3,0.7691352668879726,0.009753056125248396,0.7613241357624281,0.780066626120282
5
+ result_ablation_mbert,cola,mBERT,hf_sequence_classifier,eval_matthews_correlation,3,0.7794456953803075,0.0057304988073936455,0.7734460694651355,0.7848625278330646
6
+ result_ablation_mbert,cola,mBERT,max,eval_matthews_correlation,3,0.7682982807974945,0.017025451624376346,0.7493680462422366,0.7823569443931843
7
+ result_ablation_mbert,cola,mBERT,mean,eval_matthews_correlation,3,0.76922680287743,0.00747040261880466,0.7635290646756514,0.7776844819969452
8
+ result_ablation_mbert,cola,mBERT,mha_attention,eval_matthews_correlation,3,0.7738461591850057,0.01818686898538206,0.7589772289071167,0.7941238477427384
9
+ result_ablation_mbert,cola,mBERT,multi_branch_average,eval_matthews_correlation,3,0.7691430922099824,0.02289389606986203,0.7445812636597251,0.789889696708505
10
+ result_ablation_mbert,mrpc,mBERT,attention,eval_combined_score,3,0.8589865984251509,0.01652352740227104,0.8452440457399417,0.8773201451399064
11
+ result_ablation_mbert,mrpc,mBERT,cls,eval_combined_score,3,0.8612003953273499,0.00964506885949266,0.8501131221719458,0.8676559714795009
12
+ result_ablation_mbert,mrpc,mBERT,gated_multi_branch,eval_combined_score,3,0.8651231882911133,0.011960877533495506,0.8565959952885749,0.8787957074721782
13
+ result_ablation_mbert,mrpc,mBERT,hf_sequence_classifier,eval_combined_score,3,0.8520800770399267,0.012012652618603983,0.8382558330594807,0.859977862112587
14
+ result_ablation_mbert,mrpc,mBERT,max,eval_combined_score,3,0.8700043363354313,0.012141766266172404,0.8570936087683713,0.8811933375500738
15
+ result_ablation_mbert,mrpc,mBERT,mean,eval_combined_score,3,0.8654357149180273,0.005964748981903905,0.8617845786963434,0.8723189636865576
16
+ result_ablation_mbert,mrpc,mBERT,mha_attention,eval_combined_score,3,0.8697483616670273,0.00619431557112402,0.8650315249774821,0.8767632952461559
17
+ result_ablation_mbert,mrpc,mBERT,multi_branch_average,eval_combined_score,3,0.8723060123547228,0.005738234218203849,0.8670419052576783,0.878422920892495
18
+ result_ablation_mbert,sst2,mBERT,attention,eval_accuracy,3,0.8765290519877675,0.010019374940428994,0.8681192660550459,0.8876146788990825
19
+ result_ablation_mbert,sst2,mBERT,cls,eval_accuracy,3,0.8772935779816513,0.005255247356600745,0.8727064220183486,0.8830275229357798
20
+ result_ablation_mbert,sst2,mBERT,gated_multi_branch,eval_accuracy,2,0.8830275229357798,0.0064872181760233325,0.8784403669724771,0.8876146788990825
21
+ result_ablation_mbert,sst2,mBERT,hf_sequence_classifier,eval_accuracy,3,0.8795871559633027,0.009102355427974529,0.8692660550458715,0.8864678899082569
22
+ result_ablation_mbert,sst2,mBERT,max,eval_accuracy,3,0.8761467889908257,0.00397259359534147,0.8715596330275229,0.8784403669724771
23
+ result_ablation_mbert,sst2,mBERT,mean,eval_accuracy,3,0.8730886850152905,0.004027390578307669,0.8692660550458715,0.8772935779816514
24
+ result_ablation_mbert,sst2,mBERT,mha_attention,eval_accuracy,3,0.8799694189602446,0.006520918237474033,0.8727064220183486,0.8853211009174312
25
+ result_ablation_mbert,sst2,mBERT,multi_branch_average,eval_accuracy,2,0.8818807339449541,0.004865413632017539,0.8784403669724771,0.8853211009174312
26
+ result_ablation_mbert,vsfc,mBERT,attention,eval_accuracy,1,0.9330385344283006,0.0,0.9330385344283006,0.9330385344283006
27
+ result_ablation_mbert,vsfc,mBERT,cls,eval_accuracy,3,0.9372499473573384,0.0038077787576384315,0.9336702463676564,0.9412507896399241
28
+ result_ablation_mbert,vsfc,mBERT,gated_multi_branch,eval_accuracy,2,0.9384080859128238,0.0004466877960749482,0.9380922299431459,0.9387239418825016
29
+ result_ablation_mbert,vsfc,mBERT,hf_sequence_classifier,eval_accuracy,3,0.9364076647715308,0.0035920661421840576,0.932406822488945,0.9393556538218573
30
+ result_ablation_mbert,vsfc,mBERT,max,eval_accuracy,2,0.932406822488945,0.0008933755921497394,0.9317751105495894,0.9330385344283006
31
+ result_ablation_mbert,vsfc,mBERT,mean,eval_accuracy,1,0.936197094125079,0.0,0.936197094125079,0.936197094125079
32
+ result_ablation_mbert,vsfc,mBERT,mha_attention,eval_accuracy,1,0.9387239418825016,0.0,0.9387239418825016,0.9387239418825016
33
+ result_ablation_mbert,vsfc,mBERT,multi_branch_average,eval_accuracy,2,0.9371446620341124,0.005806941348973541,0.9330385344283006,0.9412507896399241
mbert_rtx6000_metrics/docs/ablation_summary.md ADDED
@@ -0,0 +1,148 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Ablation Result Summary
2
+
3
+ Main metric is selected per task: CoLA uses Matthews correlation; MRPC/QQP/STSB use combined GLUE score when available; classification tasks use accuracy.
4
+
5
+ ## Aggregated Results
6
+ | source | task | model | strategy | metric | n | mean | std | min | max |
7
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
8
+ | result_ablation_mbert | cola | mBERT | attention | eval_matthews_correlation | 3 | 0.7676 | 0.0118 | 0.7546 | 0.7777 |
9
+ | result_ablation_mbert | cola | mBERT | cls | eval_matthews_correlation | 3 | 0.7635 | 0.0155 | 0.7493 | 0.7800 |
10
+ | result_ablation_mbert | cola | mBERT | gated_multi_branch | eval_matthews_correlation | 3 | 0.7691 | 0.0098 | 0.7613 | 0.7801 |
11
+ | result_ablation_mbert | cola | mBERT | hf_sequence_classifier | eval_matthews_correlation | 3 | 0.7794 | 0.0057 | 0.7734 | 0.7849 |
12
+ | result_ablation_mbert | cola | mBERT | max | eval_matthews_correlation | 3 | 0.7683 | 0.0170 | 0.7494 | 0.7824 |
13
+ | result_ablation_mbert | cola | mBERT | mean | eval_matthews_correlation | 3 | 0.7692 | 0.0075 | 0.7635 | 0.7777 |
14
+ | result_ablation_mbert | cola | mBERT | mha_attention | eval_matthews_correlation | 3 | 0.7738 | 0.0182 | 0.7590 | 0.7941 |
15
+ | result_ablation_mbert | cola | mBERT | multi_branch_average | eval_matthews_correlation | 3 | 0.7691 | 0.0229 | 0.7446 | 0.7899 |
16
+ | result_ablation_mbert | mrpc | mBERT | attention | eval_combined_score | 3 | 0.8590 | 0.0165 | 0.8452 | 0.8773 |
17
+ | result_ablation_mbert | mrpc | mBERT | cls | eval_combined_score | 3 | 0.8612 | 0.0096 | 0.8501 | 0.8677 |
18
+ | result_ablation_mbert | mrpc | mBERT | gated_multi_branch | eval_combined_score | 3 | 0.8651 | 0.0120 | 0.8566 | 0.8788 |
19
+ | result_ablation_mbert | mrpc | mBERT | hf_sequence_classifier | eval_combined_score | 3 | 0.8521 | 0.0120 | 0.8383 | 0.8600 |
20
+ | result_ablation_mbert | mrpc | mBERT | max | eval_combined_score | 3 | 0.8700 | 0.0121 | 0.8571 | 0.8812 |
21
+ | result_ablation_mbert | mrpc | mBERT | mean | eval_combined_score | 3 | 0.8654 | 0.0060 | 0.8618 | 0.8723 |
22
+ | result_ablation_mbert | mrpc | mBERT | mha_attention | eval_combined_score | 3 | 0.8697 | 0.0062 | 0.8650 | 0.8768 |
23
+ | result_ablation_mbert | mrpc | mBERT | multi_branch_average | eval_combined_score | 3 | 0.8723 | 0.0057 | 0.8670 | 0.8784 |
24
+ | result_ablation_mbert | sst2 | mBERT | attention | eval_accuracy | 3 | 0.8765 | 0.0100 | 0.8681 | 0.8876 |
25
+ | result_ablation_mbert | sst2 | mBERT | cls | eval_accuracy | 3 | 0.8773 | 0.0053 | 0.8727 | 0.8830 |
26
+ | result_ablation_mbert | sst2 | mBERT | gated_multi_branch | eval_accuracy | 2 | 0.8830 | 0.0065 | 0.8784 | 0.8876 |
27
+ | result_ablation_mbert | sst2 | mBERT | hf_sequence_classifier | eval_accuracy | 3 | 0.8796 | 0.0091 | 0.8693 | 0.8865 |
28
+ | result_ablation_mbert | sst2 | mBERT | max | eval_accuracy | 3 | 0.8761 | 0.0040 | 0.8716 | 0.8784 |
29
+ | result_ablation_mbert | sst2 | mBERT | mean | eval_accuracy | 3 | 0.8731 | 0.0040 | 0.8693 | 0.8773 |
30
+ | result_ablation_mbert | sst2 | mBERT | mha_attention | eval_accuracy | 3 | 0.8800 | 0.0065 | 0.8727 | 0.8853 |
31
+ | result_ablation_mbert | sst2 | mBERT | multi_branch_average | eval_accuracy | 2 | 0.8819 | 0.0049 | 0.8784 | 0.8853 |
32
+ | result_ablation_mbert | vsfc | mBERT | attention | eval_accuracy | 1 | 0.9330 | 0.0000 | 0.9330 | 0.9330 |
33
+ | result_ablation_mbert | vsfc | mBERT | cls | eval_accuracy | 3 | 0.9372 | 0.0038 | 0.9337 | 0.9413 |
34
+ | result_ablation_mbert | vsfc | mBERT | gated_multi_branch | eval_accuracy | 2 | 0.9384 | 0.0004 | 0.9381 | 0.9387 |
35
+ | result_ablation_mbert | vsfc | mBERT | hf_sequence_classifier | eval_accuracy | 3 | 0.9364 | 0.0036 | 0.9324 | 0.9394 |
36
+ | result_ablation_mbert | vsfc | mBERT | max | eval_accuracy | 2 | 0.9324 | 0.0009 | 0.9318 | 0.9330 |
37
+ | result_ablation_mbert | vsfc | mBERT | mean | eval_accuracy | 1 | 0.9362 | 0.0000 | 0.9362 | 0.9362 |
38
+ | result_ablation_mbert | vsfc | mBERT | mha_attention | eval_accuracy | 1 | 0.9387 | 0.0000 | 0.9387 | 0.9387 |
39
+ | result_ablation_mbert | vsfc | mBERT | multi_branch_average | eval_accuracy | 2 | 0.9371 | 0.0058 | 0.9330 | 0.9413 |
40
+
41
+ ## Gated Multi-Branch Deltas
42
+ | source | task | model | baseline | gated_mean | baseline_mean | delta |
43
+ | --- | --- | --- | --- | --- | --- | --- |
44
+ | result_ablation_mbert | cola | mBERT | attention | 0.7691 | 0.7676 | 0.0015 |
45
+ | result_ablation_mbert | cola | mBERT | mha_attention | 0.7691 | 0.7738 | -0.0047 |
46
+ | result_ablation_mbert | cola | mBERT | multi_branch_average | 0.7691 | 0.7691 | -0.0000 |
47
+ | result_ablation_mbert | cola | mBERT | hf_sequence_classifier | 0.7691 | 0.7794 | -0.0103 |
48
+ | result_ablation_mbert | mrpc | mBERT | attention | 0.8651 | 0.8590 | 0.0061 |
49
+ | result_ablation_mbert | mrpc | mBERT | mha_attention | 0.8651 | 0.8697 | -0.0046 |
50
+ | result_ablation_mbert | mrpc | mBERT | multi_branch_average | 0.8651 | 0.8723 | -0.0072 |
51
+ | result_ablation_mbert | mrpc | mBERT | hf_sequence_classifier | 0.8651 | 0.8521 | 0.0130 |
52
+ | result_ablation_mbert | sst2 | mBERT | attention | 0.8830 | 0.8765 | 0.0065 |
53
+ | result_ablation_mbert | sst2 | mBERT | mha_attention | 0.8830 | 0.8800 | 0.0031 |
54
+ | result_ablation_mbert | sst2 | mBERT | multi_branch_average | 0.8830 | 0.8819 | 0.0011 |
55
+ | result_ablation_mbert | sst2 | mBERT | hf_sequence_classifier | 0.8830 | 0.8796 | 0.0034 |
56
+ | result_ablation_mbert | vsfc | mBERT | attention | 0.9384 | 0.9330 | 0.0054 |
57
+ | result_ablation_mbert | vsfc | mBERT | mha_attention | 0.9384 | 0.9387 | -0.0003 |
58
+ | result_ablation_mbert | vsfc | mBERT | multi_branch_average | 0.9384 | 0.9371 | 0.0013 |
59
+ | result_ablation_mbert | vsfc | mBERT | hf_sequence_classifier | 0.9384 | 0.9364 | 0.0020 |
60
+
61
+ ## Raw Runs
62
+ | source | task | model | strategy | seed | metric | score | eval_loss | train_loss | epoch | eval_samples | path |
63
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
64
+ | result_ablation_mbert | cola | mBERT | hf_sequence_classifier | 44.0000 | eval_matthews_correlation | 0.7800 | 0.2951 | 0.2353 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/hf_sequence_classifier/seed_44/all_results.json |
65
+ | result_ablation_mbert | cola | mBERT | hf_sequence_classifier | 42.0000 | eval_matthews_correlation | 0.7849 | 0.2611 | 0.2787 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/hf_sequence_classifier/seed_42/all_results.json |
66
+ | result_ablation_mbert | cola | mBERT | hf_sequence_classifier | 43.0000 | eval_matthews_correlation | 0.7734 | 0.3176 | 0.2482 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/hf_sequence_classifier/seed_43/all_results.json |
67
+ | result_ablation_mbert | cola | mBERT | cls | 43.0000 | eval_matthews_correlation | 0.7613 | 0.3099 | 0.2479 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/cls/seed_43/all_results.json |
68
+ | result_ablation_mbert | cola | mBERT | cls | 42.0000 | eval_matthews_correlation | 0.7493 | 0.3191 | 0.2332 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/cls/seed_42/all_results.json |
69
+ | result_ablation_mbert | cola | mBERT | cls | 44.0000 | eval_matthews_correlation | 0.7800 | 0.3232 | 0.2231 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/cls/seed_44/all_results.json |
70
+ | result_ablation_mbert | cola | mBERT | mean | 42.0000 | eval_matthews_correlation | 0.7635 | 0.2967 | 0.2257 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/mean/seed_42/all_results.json |
71
+ | result_ablation_mbert | cola | mBERT | mean | 43.0000 | eval_matthews_correlation | 0.7665 | 0.3326 | 0.2300 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/mean/seed_43/all_results.json |
72
+ | result_ablation_mbert | cola | mBERT | mean | 44.0000 | eval_matthews_correlation | 0.7777 | 0.3236 | 0.2152 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/mean/seed_44/all_results.json |
73
+ | result_ablation_mbert | cola | mBERT | max | 42.0000 | eval_matthews_correlation | 0.7824 | 0.2872 | 0.2275 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/max/seed_42/all_results.json |
74
+ | result_ablation_mbert | cola | mBERT | max | 43.0000 | eval_matthews_correlation | 0.7494 | 0.3240 | 0.2386 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/max/seed_43/all_results.json |
75
+ | result_ablation_mbert | cola | mBERT | max | 44.0000 | eval_matthews_correlation | 0.7732 | 0.3288 | 0.2220 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/max/seed_44/all_results.json |
76
+ | result_ablation_mbert | cola | mBERT | attention | 42.0000 | eval_matthews_correlation | 0.7706 | 0.3034 | 0.2245 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/attention/seed_42/all_results.json |
77
+ | result_ablation_mbert | cola | mBERT | attention | 43.0000 | eval_matthews_correlation | 0.7546 | 0.3441 | 0.2326 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/attention/seed_43/all_results.json |
78
+ | result_ablation_mbert | cola | mBERT | attention | 44.0000 | eval_matthews_correlation | 0.7777 | 0.3155 | 0.2253 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/attention/seed_44/all_results.json |
79
+ | result_ablation_mbert | cola | mBERT | mha_attention | 42.0000 | eval_matthews_correlation | 0.7590 | 0.3050 | 0.2413 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/mha_attention/seed_42/all_results.json |
80
+ | result_ablation_mbert | cola | mBERT | mha_attention | 43.0000 | eval_matthews_correlation | 0.7684 | 0.3082 | 0.2367 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/mha_attention/seed_43/all_results.json |
81
+ | result_ablation_mbert | cola | mBERT | mha_attention | 44.0000 | eval_matthews_correlation | 0.7941 | 0.2813 | 0.2284 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/mha_attention/seed_44/all_results.json |
82
+ | result_ablation_mbert | cola | mBERT | multi_branch_average | 42.0000 | eval_matthews_correlation | 0.7899 | 0.2748 | 0.2218 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/multi_branch_average/seed_42/all_results.json |
83
+ | result_ablation_mbert | cola | mBERT | multi_branch_average | 43.0000 | eval_matthews_correlation | 0.7446 | 0.3285 | 0.2263 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/multi_branch_average/seed_43/all_results.json |
84
+ | result_ablation_mbert | cola | mBERT | multi_branch_average | 44.0000 | eval_matthews_correlation | 0.7730 | 0.2956 | 0.2269 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/multi_branch_average/seed_44/all_results.json |
85
+ | result_ablation_mbert | cola | mBERT | gated_multi_branch | 42.0000 | eval_matthews_correlation | 0.7613 | 0.3221 | 0.2219 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/gated_multi_branch/seed_42/all_results.json |
86
+ | result_ablation_mbert | cola | mBERT | gated_multi_branch | 43.0000 | eval_matthews_correlation | 0.7660 | 0.3194 | 0.2232 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/gated_multi_branch/seed_43/all_results.json |
87
+ | result_ablation_mbert | cola | mBERT | gated_multi_branch | 44.0000 | eval_matthews_correlation | 0.7801 | 0.3089 | 0.2265 | 3.0000 | 1043 | /workspace/result_ablation_mbert/cola/mBERT/gated_multi_branch/seed_44/all_results.json |
88
+ | result_ablation_mbert | mrpc | mBERT | hf_sequence_classifier | 42.0000 | eval_combined_score | 0.8383 | 0.3759 | 0.4434 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/hf_sequence_classifier/seed_42/all_results.json |
89
+ | result_ablation_mbert | mrpc | mBERT | hf_sequence_classifier | 43.0000 | eval_combined_score | 0.8600 | 0.3466 | 0.4122 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/hf_sequence_classifier/seed_43/all_results.json |
90
+ | result_ablation_mbert | mrpc | mBERT | hf_sequence_classifier | 44.0000 | eval_combined_score | 0.8580 | 0.3565 | 0.4211 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/hf_sequence_classifier/seed_44/all_results.json |
91
+ | result_ablation_mbert | mrpc | mBERT | cls | 42.0000 | eval_combined_score | 0.8501 | 0.3741 | 0.3877 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/cls/seed_42/all_results.json |
92
+ | result_ablation_mbert | mrpc | mBERT | cls | 43.0000 | eval_combined_score | 0.8658 | 0.3553 | 0.3840 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/cls/seed_43/all_results.json |
93
+ | result_ablation_mbert | mrpc | mBERT | cls | 44.0000 | eval_combined_score | 0.8677 | 0.3374 | 0.3921 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/cls/seed_44/all_results.json |
94
+ | result_ablation_mbert | mrpc | mBERT | mean | 42.0000 | eval_combined_score | 0.8618 | 0.3512 | 0.3846 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/mean/seed_42/all_results.json |
95
+ | result_ablation_mbert | mrpc | mBERT | mean | 43.0000 | eval_combined_score | 0.8723 | 0.3550 | 0.3833 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/mean/seed_43/all_results.json |
96
+ | result_ablation_mbert | mrpc | mBERT | mean | 44.0000 | eval_combined_score | 0.8622 | 0.3388 | 0.3880 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/mean/seed_44/all_results.json |
97
+ | result_ablation_mbert | mrpc | mBERT | max | 42.0000 | eval_combined_score | 0.8571 | 0.3860 | 0.3780 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/max/seed_42/all_results.json |
98
+ | result_ablation_mbert | mrpc | mBERT | max | 43.0000 | eval_combined_score | 0.8717 | 0.3557 | 0.3737 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/max/seed_43/all_results.json |
99
+ | result_ablation_mbert | mrpc | mBERT | max | 44.0000 | eval_combined_score | 0.8812 | 0.3450 | 0.3843 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/max/seed_44/all_results.json |
100
+ | result_ablation_mbert | mrpc | mBERT | attention | 42.0000 | eval_combined_score | 0.8452 | 0.3901 | 0.3834 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/attention/seed_42/all_results.json |
101
+ | result_ablation_mbert | mrpc | mBERT | attention | 43.0000 | eval_combined_score | 0.8544 | 0.3341 | 0.3926 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/attention/seed_43/all_results.json |
102
+ | result_ablation_mbert | mrpc | mBERT | attention | 44.0000 | eval_combined_score | 0.8773 | 0.3299 | 0.3739 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/attention/seed_44/all_results.json |
103
+ | result_ablation_mbert | mrpc | mBERT | mha_attention | 42.0000 | eval_combined_score | 0.8650 | 0.3695 | 0.3835 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/mha_attention/seed_42/all_results.json |
104
+ | result_ablation_mbert | mrpc | mBERT | mha_attention | 43.0000 | eval_combined_score | 0.8675 | 0.3478 | 0.3729 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/mha_attention/seed_43/all_results.json |
105
+ | result_ablation_mbert | mrpc | mBERT | mha_attention | 44.0000 | eval_combined_score | 0.8768 | 0.3433 | 0.3868 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/mha_attention/seed_44/all_results.json |
106
+ | result_ablation_mbert | mrpc | mBERT | multi_branch_average | 42.0000 | eval_combined_score | 0.8715 | 0.3462 | 0.3864 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/multi_branch_average/seed_42/all_results.json |
107
+ | result_ablation_mbert | mrpc | mBERT | multi_branch_average | 43.0000 | eval_combined_score | 0.8670 | 0.3515 | 0.4034 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/multi_branch_average/seed_43/all_results.json |
108
+ | result_ablation_mbert | mrpc | mBERT | multi_branch_average | 44.0000 | eval_combined_score | 0.8784 | 0.3391 | 0.3908 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/multi_branch_average/seed_44/all_results.json |
109
+ | result_ablation_mbert | mrpc | mBERT | gated_multi_branch | 42.0000 | eval_combined_score | 0.8600 | 0.3541 | 0.3987 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/gated_multi_branch/seed_42/all_results.json |
110
+ | result_ablation_mbert | mrpc | mBERT | gated_multi_branch | 43.0000 | eval_combined_score | 0.8566 | 0.3670 | 0.3875 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/gated_multi_branch/seed_43/all_results.json |
111
+ | result_ablation_mbert | mrpc | mBERT | gated_multi_branch | 44.0000 | eval_combined_score | 0.8788 | 0.3418 | 0.3853 | 3.0000 | 408 | /workspace/result_ablation_mbert/mrpc/mBERT/gated_multi_branch/seed_44/all_results.json |
112
+ | result_ablation_mbert | sst2 | mBERT | hf_sequence_classifier | 42.0000 | eval_accuracy | 0.8830 | 0.4020 | 0.2333 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/hf_sequence_classifier/seed_42/all_results.json |
113
+ | result_ablation_mbert | sst2 | mBERT | hf_sequence_classifier | 43.0000 | eval_accuracy | 0.8865 | 0.4080 | 0.2339 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/hf_sequence_classifier/seed_43/all_results.json |
114
+ | result_ablation_mbert | sst2 | mBERT | hf_sequence_classifier | 44.0000 | eval_accuracy | 0.8693 | 0.4264 | 0.2342 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/hf_sequence_classifier/seed_44/all_results.json |
115
+ | result_ablation_mbert | sst2 | mBERT | cls | 42.0000 | eval_accuracy | 0.8761 | 0.4319 | 0.2366 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/cls/seed_42/all_results.json |
116
+ | result_ablation_mbert | sst2 | mBERT | cls | 43.0000 | eval_accuracy | 0.8830 | 0.4113 | 0.2341 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/cls/seed_43/all_results.json |
117
+ | result_ablation_mbert | sst2 | mBERT | cls | 44.0000 | eval_accuracy | 0.8727 | 0.4220 | 0.2343 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/cls/seed_44/all_results.json |
118
+ | result_ablation_mbert | sst2 | mBERT | mean | 42.0000 | eval_accuracy | 0.8693 | 0.4307 | 0.2315 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/mean/seed_42/all_results.json |
119
+ | result_ablation_mbert | sst2 | mBERT | mean | 43.0000 | eval_accuracy | 0.8773 | 0.4268 | 0.2314 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/mean/seed_43/all_results.json |
120
+ | result_ablation_mbert | sst2 | mBERT | mean | 44.0000 | eval_accuracy | 0.8727 | 0.4250 | 0.2312 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/mean/seed_44/all_results.json |
121
+ | result_ablation_mbert | sst2 | mBERT | max | 42.0000 | eval_accuracy | 0.8716 | 0.4406 | 0.2342 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/max/seed_42/all_results.json |
122
+ | result_ablation_mbert | sst2 | mBERT | max | 43.0000 | eval_accuracy | 0.8784 | 0.4523 | 0.2331 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/max/seed_43/all_results.json |
123
+ | result_ablation_mbert | sst2 | mBERT | max | 44.0000 | eval_accuracy | 0.8784 | 0.4276 | 0.2327 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/max/seed_44/all_results.json |
124
+ | result_ablation_mbert | sst2 | mBERT | attention | 42.0000 | eval_accuracy | 0.8739 | 0.4220 | 0.2330 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/attention/seed_42/all_results.json |
125
+ | result_ablation_mbert | sst2 | mBERT | attention | 43.0000 | eval_accuracy | 0.8876 | 0.3722 | 0.2263 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/attention/seed_43/all_results.json |
126
+ | result_ablation_mbert | sst2 | mBERT | attention | 44.0000 | eval_accuracy | 0.8681 | 0.4197 | 0.2289 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/attention/seed_44/all_results.json |
127
+ | result_ablation_mbert | sst2 | mBERT | mha_attention | 42.0000 | eval_accuracy | 0.8727 | 0.4217 | 0.2327 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/mha_attention/seed_42/all_results.json |
128
+ | result_ablation_mbert | sst2 | mBERT | mha_attention | 43.0000 | eval_accuracy | 0.8819 | 0.4069 | 0.2331 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/mha_attention/seed_43/all_results.json |
129
+ | result_ablation_mbert | sst2 | mBERT | mha_attention | 44.0000 | eval_accuracy | 0.8853 | 0.4113 | 0.2334 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/mha_attention/seed_44/all_results.json |
130
+ | result_ablation_mbert | sst2 | mBERT | multi_branch_average | 42.0000 | eval_accuracy | 0.8853 | 0.4264 | 0.2324 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/multi_branch_average/seed_42/all_results.json |
131
+ | result_ablation_mbert | sst2 | mBERT | multi_branch_average | 43.0000 | eval_accuracy | 0.8784 | 0.4188 | 0.2312 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/multi_branch_average/seed_43/all_results.json |
132
+ | result_ablation_mbert | sst2 | mBERT | gated_multi_branch | 43.0000 | eval_accuracy | 0.8784 | 0.4385 | 0.2329 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/gated_multi_branch/seed_43/all_results.json |
133
+ | result_ablation_mbert | sst2 | mBERT | gated_multi_branch | 44.0000 | eval_accuracy | 0.8876 | 0.4107 | 0.2325 | 3.0000 | 872 | /workspace/result_ablation_mbert/sst2/mBERT/gated_multi_branch/seed_44/all_results.json |
134
+ | result_ablation_mbert | vsfc | mBERT | hf_sequence_classifier | 42.0000 | eval_accuracy | 0.9394 | 0.2236 | 0.2049 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/hf_sequence_classifier/seed_42/all_results.json |
135
+ | result_ablation_mbert | vsfc | mBERT | hf_sequence_classifier | 43.0000 | eval_accuracy | 0.9375 | 0.2295 | 0.2128 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/hf_sequence_classifier/seed_43/all_results.json |
136
+ | result_ablation_mbert | vsfc | mBERT | hf_sequence_classifier | 44.0000 | eval_accuracy | 0.9324 | 0.2392 | 0.2143 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/hf_sequence_classifier/seed_44/all_results.json |
137
+ | result_ablation_mbert | vsfc | mBERT | cls | 42.0000 | eval_accuracy | 0.9413 | 0.2187 | 0.2039 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/cls/seed_42/all_results.json |
138
+ | result_ablation_mbert | vsfc | mBERT | cls | 43.0000 | eval_accuracy | 0.9337 | 0.2425 | 0.2038 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/cls/seed_43/all_results.json |
139
+ | result_ablation_mbert | vsfc | mBERT | cls | 44.0000 | eval_accuracy | 0.9368 | 0.2246 | 0.2049 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/cls/seed_44/all_results.json |
140
+ | result_ablation_mbert | vsfc | mBERT | mean | 44.0000 | eval_accuracy | 0.9362 | 0.2267 | 0.2019 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/mean/seed_44/all_results.json |
141
+ | result_ablation_mbert | vsfc | mBERT | max | 43.0000 | eval_accuracy | 0.9318 | 0.2397 | 0.2094 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/max/seed_43/all_results.json |
142
+ | result_ablation_mbert | vsfc | mBERT | max | 44.0000 | eval_accuracy | 0.9330 | 0.2280 | 0.2075 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/max/seed_44/all_results.json |
143
+ | result_ablation_mbert | vsfc | mBERT | attention | 44.0000 | eval_accuracy | 0.9330 | 0.2232 | 0.2006 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/attention/seed_44/all_results.json |
144
+ | result_ablation_mbert | vsfc | mBERT | mha_attention | 43.0000 | eval_accuracy | 0.9387 | 0.2243 | 0.2072 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/mha_attention/seed_43/all_results.json |
145
+ | result_ablation_mbert | vsfc | mBERT | multi_branch_average | 42.0000 | eval_accuracy | 0.9413 | 0.2125 | 0.2045 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/multi_branch_average/seed_42/all_results.json |
146
+ | result_ablation_mbert | vsfc | mBERT | multi_branch_average | 43.0000 | eval_accuracy | 0.9330 | 0.2274 | 0.2048 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/multi_branch_average/seed_43/all_results.json |
147
+ | result_ablation_mbert | vsfc | mBERT | gated_multi_branch | 42.0000 | eval_accuracy | 0.9381 | 0.2134 | 0.2020 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/gated_multi_branch/seed_42/all_results.json |
148
+ | result_ablation_mbert | vsfc | mBERT | gated_multi_branch | 44.0000 | eval_accuracy | 0.9387 | 0.2262 | 0.2063 | 3.0000 | 1583 | /workspace/result_ablation_mbert/vsfc/mBERT/gated_multi_branch/seed_44/all_results.json |
mbert_rtx6000_metrics/docs/reviewer_experiment_plan.md ADDED
@@ -0,0 +1,110 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Reviewer-Focused Additional Experiments
2
+
3
+ Muc tieu cua bo thi nghiem bo sung la tra loi truc tiep ba nhan xet lap lai trong review:
4
+
5
+ 1. Lam ro mo hinh de xuat la mot pooling/classification head dat tren PLM, khong phai thay the cac MHA block ben trong Transformer.
6
+ 2. Chung minh loi ich den tu gate, khong chi den tu viec them attention pooling/head phuc tap hon.
7
+ 3. Bao cao do on dinh bang nhieu seed va do lech chuan.
8
+
9
+ ## Ablation Can Chay
10
+
11
+ Chay cung backbone, cung split, cung hyper-parameter, khac duy nhat o `--pooling_strategy`.
12
+
13
+ | Strategy | Vai tro trong paper |
14
+ | --- | --- |
15
+ | `hf_sequence_classifier` | Baseline fine-tuning chuan cua HuggingFace/PLM. |
16
+ | `cls` | Baseline pooled representation tu token dau tien voi cung MLP classifier. |
17
+ | `mean` | Masked mean pooling baseline. |
18
+ | `max` | Masked max pooling baseline. |
19
+ | `attention` | Standard attention pooling khong gate. Day la baseline reviewer yeu cau ro nhat. |
20
+ | `mha_attention` | Mot lop MHA + attention pooling, khong multi-branch va khong gate. |
21
+ | `multi_branch_average` | Cung 3 MHA branch nhu de xuat nhung tron deu, dung de tach loi ich cua gate khoi loi ich tang tham so/branch. |
22
+ | `gated_multi_branch` | Phuong phap de xuat. |
23
+
24
+ ## Lenh Chay Khuyen Nghi Tren m-gpux
25
+
26
+ Moi truong m-gpux/Modal hien co the dung Python 3.9, nen `requirements.txt` da duoc de o dang toi gian va tuong thich Python 3.9. Cai dependencies bang:
27
+
28
+ ```bash
29
+ python -m pip install -r requirements.txt
30
+ ```
31
+
32
+ Test nhanh luong truoc khi chay that:
33
+
34
+ ```bash
35
+ python scripts/run_ablation_grid.py \
36
+ --models PhoBERT \
37
+ --tasks cola \
38
+ --strategies hf_sequence_classifier attention gated_multi_branch \
39
+ --seeds 42 \
40
+ --limit 32 \
41
+ --max_runs 3
42
+ ```
43
+
44
+ Khong chay truc tiep `run_glue.py` cho bo ablation nay, vi script runner moi se tu goi `run_glue_MHA_gated.py` voi day du tham so cho tung baseline.
45
+
46
+ Neu m-gpux UI chi cho chon file va tu goi `python run_glue_MHA_gated.py` khong kem tham so, file nay da co che do no-arg launcher. Mac dinh no chay preset `core`. Co the dieu khien bang environment variables:
47
+
48
+ ```bash
49
+ ABLATION_PRESET=smoke ABLATION_DRY_RUN=1 python run_glue_MHA_gated.py
50
+ ABLATION_PRESET=smoke python run_glue_MHA_gated.py
51
+ ABLATION_PRESET=core python run_glue_MHA_gated.py
52
+ ABLATION_PRESET=full python run_glue_MHA_gated.py
53
+ ```
54
+
55
+ Bien hay dung: `ABLATION_LIMIT`, `ABLATION_MAX_RUNS`, `ABLATION_MODELS`, `ABLATION_TASKS`, `ABLATION_STRATEGIES`, `ABLATION_SEEDS`.
56
+
57
+ Lenh full ablation cho reviewer, mac dinh bat `bf16` va `tf32`:
58
+
59
+ ```bash
60
+ python scripts/run_ablation_grid.py \
61
+ --models all \
62
+ --tasks cola mrpc sst2 vnrte vsfc vsmec vtoc qqp \
63
+ --strategies hf_sequence_classifier cls mean max attention mha_attention multi_branch_average gated_multi_branch \
64
+ --seeds 42 43 44 \
65
+ --output_root result_ablation \
66
+ --epochs 3 \
67
+ --train_batch_size 32 \
68
+ --eval_batch_size 64
69
+ ```
70
+
71
+ Neu muon chay gon hon nhung van tra loi dung reviewer, chay cac baseline cot loi:
72
+
73
+ ```bash
74
+ python scripts/run_ablation_grid.py \
75
+ --models PhoBERT mDeBERTaV3 XLMR_base \
76
+ --tasks cola mrpc sst2 vnrte vsfc \
77
+ --strategies hf_sequence_classifier attention multi_branch_average gated_multi_branch \
78
+ --seeds 42 43 44
79
+ ```
80
+
81
+ Sau khi chay xong, tao bang tong hop:
82
+
83
+ ```bash
84
+ python scripts/summarize_results.py --roots result_MHA result_ablation
85
+ ```
86
+
87
+ File can lay so lieu:
88
+
89
+ - `docs/ablation_summary.md`
90
+ - `docs/ablation_results.csv`
91
+ - `docs/ablation_results_aggregate.csv`
92
+
93
+ ## Cach Dua Vao Paper
94
+
95
+ Ten goi nen sua thanh `Gated Multi-Branch Attention Pooling` hoac `GMAP` thay vi noi nhu the da thay cac MHA block ben trong PLM. Mo ta kien truc:
96
+
97
+ > We keep the pretrained Transformer backbone unchanged and attach a lightweight gated multi-branch attention pooling head on top of the final hidden states. The proposed gate dynamically combines representations produced by multiple attention branches with different head granularities.
98
+
99
+ Bang ablation nen co cac cot:
100
+
101
+ | Task | Backbone | CLS | Mean | Max | Attn pooling | MHA+Attn | Multi-branch avg | Ours |
102
+ | --- | --- | --- | --- | --- | --- | --- | --- | --- |
103
+
104
+ Trong response rebuttal, neu ket qua ung ho, viet ngan gon:
105
+
106
+ > To isolate the contribution of the gate, we added standard attention pooling and ungated multi-branch pooling baselines under the same PLM backbone and hyper-parameters. The gated variant consistently improves over attention pooling and over the ungated multi-branch average on tasks where our method previously showed the largest gains, indicating that the improvement is not merely due to adding a larger pooling head.
107
+
108
+ Neu co task giam, nen noi thang:
109
+
110
+ > The ablation also confirms that the gate is less robust on NLI-style reasoning tasks, where excessive suppression of branch-specific signals can hurt entailment decisions. We now discuss this as a limitation and avoid overclaiming language-specific universality.