Upload ablation summaries
Browse files
mbert_paper_metrics/docs/ablation_results.csv
ADDED
|
@@ -0,0 +1,86 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
source,task,model,strategy,seed,metric,score,eval_loss,train_loss,epoch,eval_samples,path
|
| 2 |
+
result_ablation_mbert_paper,cola,mBERT,hf_sequence_classifier,43,eval_matthews_correlation,0.770612811219967,0.29299455881118774,0.2762503274876009,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/hf_sequence_classifier/seed_43/all_results.json
|
| 3 |
+
result_ablation_mbert_paper,cola,mBERT,hf_sequence_classifier,44,eval_matthews_correlation,0.7729513152802419,0.2829234004020691,0.2657259335027677,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/hf_sequence_classifier/seed_44/all_results.json
|
| 4 |
+
result_ablation_mbert_paper,cola,mBERT,hf_sequence_classifier,42,eval_matthews_correlation,0.7493368300485673,0.29159754514694214,0.2649622825075904,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/hf_sequence_classifier/seed_42/all_results.json
|
| 5 |
+
result_ablation_mbert_paper,cola,mBERT,cls,42,eval_matthews_correlation,0.7376962218509274,0.3118951916694641,0.26396365477659994,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/cls/seed_42/all_results.json
|
| 6 |
+
result_ablation_mbert_paper,cola,mBERT,cls,43,eval_matthews_correlation,0.7682410888691849,0.2848812937736511,0.26999439925790947,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/cls/seed_43/all_results.json
|
| 7 |
+
result_ablation_mbert_paper,cola,mBERT,cls,44,eval_matthews_correlation,0.7682418720619295,0.29706358909606934,0.25450210036518417,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/cls/seed_44/all_results.json
|
| 8 |
+
result_ablation_mbert_paper,cola,mBERT,mean,42,eval_matthews_correlation,0.7326843807709864,0.3039434850215912,0.253393465856154,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/mean/seed_42/all_results.json
|
| 9 |
+
result_ablation_mbert_paper,cola,mBERT,mean,43,eval_matthews_correlation,0.754077957159406,0.27793699502944946,0.25863299785744737,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/mean/seed_43/all_results.json
|
| 10 |
+
result_ablation_mbert_paper,cola,mBERT,mean,44,eval_matthews_correlation,0.780066626120282,0.28495022654533386,0.248950488099428,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/mean/seed_44/all_results.json
|
| 11 |
+
result_ablation_mbert_paper,cola,mBERT,max,42,eval_matthews_correlation,0.7354753979999464,0.3104800581932068,0.26581977386712285,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/max/seed_42/all_results.json
|
| 12 |
+
result_ablation_mbert_paper,cola,mBERT,max,43,eval_matthews_correlation,0.7638553055158107,0.2829602062702179,0.26830103241394615,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/max/seed_43/all_results.json
|
| 13 |
+
result_ablation_mbert_paper,cola,mBERT,max,44,eval_matthews_correlation,0.7660475389427814,0.2871670424938202,0.2616100898041532,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/max/seed_44/all_results.json
|
| 14 |
+
result_ablation_mbert_paper,cola,mBERT,attention,42,eval_matthews_correlation,0.7469782739691797,0.3018324375152588,0.24882495366152946,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/attention/seed_42/all_results.json
|
| 15 |
+
result_ablation_mbert_paper,cola,mBERT,attention,43,eval_matthews_correlation,0.7658812546257014,0.28296852111816406,0.26194217346167636,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/attention/seed_43/all_results.json
|
| 16 |
+
result_ablation_mbert_paper,cola,mBERT,attention,44,eval_matthews_correlation,0.7776576303274508,0.28675714135169983,0.25152042053198886,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/attention/seed_44/all_results.json
|
| 17 |
+
result_ablation_mbert_paper,cola,mBERT,mha_attention,42,eval_matthews_correlation,0.7567278866591004,0.29563766717910767,0.2555409435913942,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/mha_attention/seed_42/all_results.json
|
| 18 |
+
result_ablation_mbert_paper,cola,mBERT,mha_attention,43,eval_matthews_correlation,0.7658953499343086,0.28152528405189514,0.2595851941272106,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/mha_attention/seed_43/all_results.json
|
| 19 |
+
result_ablation_mbert_paper,cola,mBERT,mha_attention,44,eval_matthews_correlation,0.7825325139082561,0.29194650053977966,0.2563083238690813,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/mha_attention/seed_44/all_results.json
|
| 20 |
+
result_ablation_mbert_paper,cola,mBERT,multi_branch_average,42,eval_matthews_correlation,0.7659095653684762,0.3065093457698822,0.24797400135860265,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/multi_branch_average/seed_42/all_results.json
|
| 21 |
+
result_ablation_mbert_paper,cola,mBERT,multi_branch_average,43,eval_matthews_correlation,0.7755378989688012,0.29216188192367554,0.2530268902347838,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/multi_branch_average/seed_43/all_results.json
|
| 22 |
+
result_ablation_mbert_paper,cola,mBERT,multi_branch_average,44,eval_matthews_correlation,0.7637565208987979,0.3023347556591034,0.24633375506534755,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/multi_branch_average/seed_44/all_results.json
|
| 23 |
+
result_ablation_mbert_paper,cola,mBERT,gated_multi_branch,42,eval_matthews_correlation,0.7641385489211314,0.30131927132606506,0.2518068837970959,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/gated_multi_branch/seed_42/all_results.json
|
| 24 |
+
result_ablation_mbert_paper,cola,mBERT,gated_multi_branch,43,eval_matthews_correlation,0.7733151048821603,0.29004186391830444,0.2568637455735251,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/gated_multi_branch/seed_43/all_results.json
|
| 25 |
+
result_ablation_mbert_paper,cola,mBERT,gated_multi_branch,44,eval_matthews_correlation,0.7612027683763856,0.2971993386745453,0.24624314412149684,3.0,1043,/workspace/result_ablation_mbert_paper/cola/mBERT/gated_multi_branch/seed_44/all_results.json
|
| 26 |
+
result_ablation_mbert_paper,mrpc,mBERT,hf_sequence_classifier,42,eval_combined_score,0.8253864685806063,0.4254560172557831,0.49463975602301996,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/hf_sequence_classifier/seed_42/all_results.json
|
| 27 |
+
result_ablation_mbert_paper,mrpc,mBERT,hf_sequence_classifier,43,eval_combined_score,0.8455882352941176,0.3725244998931885,0.46222704044286755,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/hf_sequence_classifier/seed_43/all_results.json
|
| 28 |
+
result_ablation_mbert_paper,mrpc,mBERT,hf_sequence_classifier,44,eval_combined_score,0.8302672780138671,0.3804396092891693,0.483004673667576,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/hf_sequence_classifier/seed_44/all_results.json
|
| 29 |
+
result_ablation_mbert_paper,mrpc,mBERT,cls,42,eval_combined_score,0.8410480428333695,0.3644275963306427,0.45598942300547723,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/cls/seed_42/all_results.json
|
| 30 |
+
result_ablation_mbert_paper,mrpc,mBERT,cls,43,eval_combined_score,0.8470165044435041,0.36292579770088196,0.46971310739931854,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/cls/seed_43/all_results.json
|
| 31 |
+
result_ablation_mbert_paper,mrpc,mBERT,cls,44,eval_combined_score,0.8299842837898519,0.3964694142341614,0.4667554800061212,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/cls/seed_44/all_results.json
|
| 32 |
+
result_ablation_mbert_paper,mrpc,mBERT,mean,42,eval_combined_score,0.8496400405180522,0.3612998425960541,0.45593016389487445,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/mean/seed_42/all_results.json
|
| 33 |
+
result_ablation_mbert_paper,mrpc,mBERT,mean,43,eval_combined_score,0.8611445944498017,0.35825616121292114,0.45004705760789954,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/mean/seed_43/all_results.json
|
| 34 |
+
result_ablation_mbert_paper,mrpc,mBERT,mean,44,eval_combined_score,0.8521589486858574,0.3471723198890686,0.471782013989877,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/mean/seed_44/all_results.json
|
| 35 |
+
result_ablation_mbert_paper,mrpc,mBERT,max,42,eval_combined_score,0.8575504828797191,0.32597723603248596,0.4347189405690069,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/max/seed_42/all_results.json
|
| 36 |
+
result_ablation_mbert_paper,mrpc,mBERT,max,43,eval_combined_score,0.8480907445245008,0.3480868637561798,0.4460654051407524,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/max/seed_43/all_results.json
|
| 37 |
+
result_ablation_mbert_paper,mrpc,mBERT,max,44,eval_combined_score,0.8534696406443618,0.3394975960254669,0.450698631397192,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/max/seed_44/all_results.json
|
| 38 |
+
result_ablation_mbert_paper,mrpc,mBERT,attention,42,eval_combined_score,0.8390056022408964,0.3504122495651245,0.45668997971907904,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/attention/seed_42/all_results.json
|
| 39 |
+
result_ablation_mbert_paper,mrpc,mBERT,attention,43,eval_combined_score,0.8505793226381462,0.3571226894855499,0.4533920979154283,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/attention/seed_43/all_results.json
|
| 40 |
+
result_ablation_mbert_paper,mrpc,mBERT,attention,44,eval_combined_score,0.8534696406443618,0.3526938855648041,0.43968486094820325,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/attention/seed_44/all_results.json
|
| 41 |
+
result_ablation_mbert_paper,mrpc,mBERT,mha_attention,42,eval_combined_score,0.8476126638500704,0.3495355248451233,0.4543619294097458,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/mha_attention/seed_42/all_results.json
|
| 42 |
+
result_ablation_mbert_paper,mrpc,mBERT,mha_attention,43,eval_combined_score,0.8642209572000843,0.35361242294311523,0.45127414620440937,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/mha_attention/seed_43/all_results.json
|
| 43 |
+
result_ablation_mbert_paper,mrpc,mBERT,mha_attention,44,eval_combined_score,0.8602692001014824,0.34483397006988525,0.4315704953843269,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/mha_attention/seed_44/all_results.json
|
| 44 |
+
result_ablation_mbert_paper,mrpc,mBERT,multi_branch_average,42,eval_combined_score,0.8541666666666667,0.3541111648082733,0.4266196886698405,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/multi_branch_average/seed_42/all_results.json
|
| 45 |
+
result_ablation_mbert_paper,mrpc,mBERT,multi_branch_average,43,eval_combined_score,0.8521384241770102,0.3447897434234619,0.44704633519269416,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/multi_branch_average/seed_43/all_results.json
|
| 46 |
+
result_ablation_mbert_paper,mrpc,mBERT,multi_branch_average,44,eval_combined_score,0.8455882352941176,0.36839744448661804,0.45383731178615405,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/multi_branch_average/seed_44/all_results.json
|
| 47 |
+
result_ablation_mbert_paper,mrpc,mBERT,gated_multi_branch,42,eval_combined_score,0.8460712752254187,0.35193678736686707,0.43137245592863666,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/gated_multi_branch/seed_42/all_results.json
|
| 48 |
+
result_ablation_mbert_paper,mrpc,mBERT,gated_multi_branch,43,eval_combined_score,0.8387623866751002,0.3761675953865051,0.47398912733879645,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/gated_multi_branch/seed_43/all_results.json
|
| 49 |
+
result_ablation_mbert_paper,mrpc,mBERT,gated_multi_branch,44,eval_combined_score,0.8575317965023848,0.3612346351146698,0.4625883102416992,3.0,408,/workspace/result_ablation_mbert_paper/mrpc/mBERT/gated_multi_branch/seed_44/all_results.json
|
| 50 |
+
result_ablation_mbert_paper,sst2,mBERT,hf_sequence_classifier,42,eval_accuracy,0.8692660550458715,0.41583138704299927,0.25817311102685153,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/hf_sequence_classifier/seed_42/all_results.json
|
| 51 |
+
result_ablation_mbert_paper,sst2,mBERT,hf_sequence_classifier,43,eval_accuracy,0.8784403669724771,0.3952513635158539,0.2572364091684397,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/hf_sequence_classifier/seed_43/all_results.json
|
| 52 |
+
result_ablation_mbert_paper,sst2,mBERT,hf_sequence_classifier,44,eval_accuracy,0.8887614678899083,0.3691374659538269,0.2579068086115217,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/hf_sequence_classifier/seed_44/all_results.json
|
| 53 |
+
result_ablation_mbert_paper,sst2,mBERT,cls,42,eval_accuracy,0.8761467889908257,0.3886052072048187,0.258284489502533,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/cls/seed_42/all_results.json
|
| 54 |
+
result_ablation_mbert_paper,sst2,mBERT,cls,43,eval_accuracy,0.8795871559633027,0.4069834053516388,0.2556771510948006,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/cls/seed_43/all_results.json
|
| 55 |
+
result_ablation_mbert_paper,sst2,mBERT,cls,44,eval_accuracy,0.8772935779816514,0.3817909359931946,0.258519544737356,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/cls/seed_44/all_results.json
|
| 56 |
+
result_ablation_mbert_paper,sst2,mBERT,mean,42,eval_accuracy,0.8727064220183486,0.4059946835041046,0.2541694050258054,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/mean/seed_42/all_results.json
|
| 57 |
+
result_ablation_mbert_paper,sst2,mBERT,mean,43,eval_accuracy,0.8899082568807339,0.36042511463165283,0.25073873817401565,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/mean/seed_43/all_results.json
|
| 58 |
+
result_ablation_mbert_paper,sst2,mBERT,mean,44,eval_accuracy,0.8795871559633027,0.36055734753608704,0.25276644231776635,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/mean/seed_44/all_results.json
|
| 59 |
+
result_ablation_mbert_paper,sst2,mBERT,max,42,eval_accuracy,0.8704128440366973,0.39564406871795654,0.25985605001260814,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/max/seed_42/all_results.json
|
| 60 |
+
result_ablation_mbert_paper,sst2,mBERT,max,43,eval_accuracy,0.8807339449541285,0.3865836262702942,0.2557056057764629,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/max/seed_43/all_results.json
|
| 61 |
+
result_ablation_mbert_paper,sst2,mBERT,max,44,eval_accuracy,0.8727064220183486,0.3852188289165497,0.25833009861804906,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/max/seed_44/all_results.json
|
| 62 |
+
result_ablation_mbert_paper,sst2,mBERT,attention,42,eval_accuracy,0.8646788990825688,0.3976602256298065,0.25434095785906646,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/attention/seed_42/all_results.json
|
| 63 |
+
result_ablation_mbert_paper,sst2,mBERT,attention,43,eval_accuracy,0.8876146788990825,0.364219605922699,0.25101400063515467,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/attention/seed_43/all_results.json
|
| 64 |
+
result_ablation_mbert_paper,sst2,mBERT,attention,44,eval_accuracy,0.8922018348623854,0.36168256402015686,0.2530433903208632,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/attention/seed_44/all_results.json
|
| 65 |
+
result_ablation_mbert_paper,sst2,mBERT,mha_attention,42,eval_accuracy,0.8738532110091743,0.3983316421508789,0.255206602108639,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/mha_attention/seed_42/all_results.json
|
| 66 |
+
result_ablation_mbert_paper,sst2,mBERT,mha_attention,43,eval_accuracy,0.8772935779816514,0.3741566240787506,0.25444065910997793,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/mha_attention/seed_43/all_results.json
|
| 67 |
+
result_ablation_mbert_paper,sst2,mBERT,mha_attention,44,eval_accuracy,0.8772935779816514,0.37144818902015686,0.2535441892069668,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/mha_attention/seed_44/all_results.json
|
| 68 |
+
result_ablation_mbert_paper,sst2,mBERT,multi_branch_average,42,eval_accuracy,0.8704128440366973,0.3955666124820709,0.2517735588106954,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/multi_branch_average/seed_42/all_results.json
|
| 69 |
+
result_ablation_mbert_paper,sst2,mBERT,multi_branch_average,43,eval_accuracy,0.8956422018348624,0.3477514088153839,0.25490886166467613,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/multi_branch_average/seed_43/all_results.json
|
| 70 |
+
result_ablation_mbert_paper,sst2,mBERT,multi_branch_average,44,eval_accuracy,0.8784403669724771,0.37326204776763916,0.2546568879605472,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/multi_branch_average/seed_44/all_results.json
|
| 71 |
+
result_ablation_mbert_paper,sst2,mBERT,gated_multi_branch,42,eval_accuracy,0.8727064220183486,0.3894171714782715,0.25268540642890114,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/gated_multi_branch/seed_42/all_results.json
|
| 72 |
+
result_ablation_mbert_paper,sst2,mBERT,gated_multi_branch,43,eval_accuracy,0.8795871559633027,0.3644150495529175,0.25354172120656837,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/gated_multi_branch/seed_43/all_results.json
|
| 73 |
+
result_ablation_mbert_paper,sst2,mBERT,gated_multi_branch,44,eval_accuracy,0.875,0.3715299665927887,0.2534543433457448,3.0,872,/workspace/result_ablation_mbert_paper/sst2/mBERT/gated_multi_branch/seed_44/all_results.json
|
| 74 |
+
result_ablation_mbert_paper,vsfc,mBERT,hf_sequence_classifier,42,eval_accuracy,0.932406822488945,0.22124937176704407,0.2343874844637784,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/hf_sequence_classifier/seed_42/all_results.json
|
| 75 |
+
result_ablation_mbert_paper,vsfc,mBERT,hf_sequence_classifier,43,eval_accuracy,0.9317751105495894,0.2234826385974884,0.2347770626450474,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/hf_sequence_classifier/seed_43/all_results.json
|
| 76 |
+
result_ablation_mbert_paper,vsfc,mBERT,cls,42,eval_accuracy,0.932406822488945,0.21786975860595703,0.22804758987782442,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/cls/seed_42/all_results.json
|
| 77 |
+
result_ablation_mbert_paper,vsfc,mBERT,cls,44,eval_accuracy,0.934301958307012,0.21943630278110504,0.232700464608786,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/cls/seed_44/all_results.json
|
| 78 |
+
result_ablation_mbert_paper,vsfc,mBERT,mean,42,eval_accuracy,0.9330385344283006,0.2111007124185562,0.2191808607194807,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/mean/seed_42/all_results.json
|
| 79 |
+
result_ablation_mbert_paper,vsfc,mBERT,max,42,eval_accuracy,0.9336702463676564,0.2267763763666153,0.23415064422678558,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/max/seed_42/all_results.json
|
| 80 |
+
result_ablation_mbert_paper,vsfc,mBERT,max,43,eval_accuracy,0.9349336702463676,0.22572965919971466,0.23497871141055804,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/max/seed_43/all_results.json
|
| 81 |
+
result_ablation_mbert_paper,vsfc,mBERT,attention,44,eval_accuracy,0.9368288060644346,0.20899568498134613,0.22297021336766668,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/attention/seed_44/all_results.json
|
| 82 |
+
result_ablation_mbert_paper,vsfc,mBERT,mha_attention,42,eval_accuracy,0.9330385344283006,0.21529895067214966,0.22224652350365698,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/mha_attention/seed_42/all_results.json
|
| 83 |
+
result_ablation_mbert_paper,vsfc,mBERT,mha_attention,43,eval_accuracy,0.9330385344283006,0.21815043687820435,0.22466682554124953,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/mha_attention/seed_43/all_results.json
|
| 84 |
+
result_ablation_mbert_paper,vsfc,mBERT,mha_attention,44,eval_accuracy,0.934301958307012,0.21530689299106598,0.22445858664168067,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/mha_attention/seed_44/all_results.json
|
| 85 |
+
result_ablation_mbert_paper,vsfc,mBERT,multi_branch_average,42,eval_accuracy,0.9355653821857233,0.2141973078250885,0.21831525011218234,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/multi_branch_average/seed_42/all_results.json
|
| 86 |
+
result_ablation_mbert_paper,vsfc,mBERT,gated_multi_branch,44,eval_accuracy,0.9279848389134555,0.21946457028388977,0.2245002004094335,3.0,1583,/workspace/result_ablation_mbert_paper/vsfc/mBERT/gated_multi_branch/seed_44/all_results.json
|
mbert_paper_metrics/docs/ablation_results_aggregate.csv
ADDED
|
@@ -0,0 +1,33 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
source,task,model,strategy,metric,n,mean,std,min,max
|
| 2 |
+
result_ablation_mbert_paper,cola,mBERT,attention,eval_matthews_correlation,3,0.7635057196407773,0.01547701849478434,0.7469782739691797,0.7776576303274508
|
| 3 |
+
result_ablation_mbert_paper,cola,mBERT,cls,eval_matthews_correlation,3,0.758059727594014,0.01763531328797097,0.7376962218509274,0.7682418720619295
|
| 4 |
+
result_ablation_mbert_paper,cola,mBERT,gated_multi_branch,eval_matthews_correlation,3,0.7662188073932258,0.00631844762503582,0.7612027683763856,0.7733151048821603
|
| 5 |
+
result_ablation_mbert_paper,cola,mBERT,hf_sequence_classifier,eval_matthews_correlation,3,0.764300318849592,0.013011404541162163,0.7493368300485673,0.7729513152802419
|
| 6 |
+
result_ablation_mbert_paper,cola,mBERT,max,eval_matthews_correlation,3,0.7551260808195128,0.017053254038628063,0.7354753979999464,0.7660475389427814
|
| 7 |
+
result_ablation_mbert_paper,cola,mBERT,mean,eval_matthews_correlation,3,0.7556096546835581,0.023728229317931226,0.7326843807709864,0.780066626120282
|
| 8 |
+
result_ablation_mbert_paper,cola,mBERT,mha_attention,eval_matthews_correlation,3,0.7683852501672217,0.013081261378183762,0.7567278866591004,0.7825325139082561
|
| 9 |
+
result_ablation_mbert_paper,cola,mBERT,multi_branch_average,eval_matthews_correlation,3,0.7684013284120251,0.006273506165294328,0.7637565208987979,0.7755378989688012
|
| 10 |
+
result_ablation_mbert_paper,mrpc,mBERT,attention,eval_combined_score,3,0.8476848551744681,0.0076541203386116755,0.8390056022408964,0.8534696406443618
|
| 11 |
+
result_ablation_mbert_paper,mrpc,mBERT,cls,eval_combined_score,3,0.8393496103555752,0.008642201094622479,0.8299842837898519,0.8470165044435041
|
| 12 |
+
result_ablation_mbert_paper,mrpc,mBERT,gated_multi_branch,eval_combined_score,3,0.8474551528009678,0.009460920894618174,0.8387623866751002,0.8575317965023848
|
| 13 |
+
result_ablation_mbert_paper,mrpc,mBERT,hf_sequence_classifier,eval_combined_score,3,0.833747327296197,0.010540915607401833,0.8253864685806063,0.8455882352941176
|
| 14 |
+
result_ablation_mbert_paper,mrpc,mBERT,max,eval_combined_score,3,0.8530369560161939,0.0047446890759971425,0.8480907445245008,0.8575504828797191
|
| 15 |
+
result_ablation_mbert_paper,mrpc,mBERT,mean,eval_combined_score,3,0.8543145278845704,0.006047609573507266,0.8496400405180522,0.8611445944498017
|
| 16 |
+
result_ablation_mbert_paper,mrpc,mBERT,mha_attention,eval_combined_score,3,0.8573676070505457,0.008676017731365136,0.8476126638500704,0.8642209572000843
|
| 17 |
+
result_ablation_mbert_paper,mrpc,mBERT,multi_branch_average,eval_combined_score,3,0.8506311087125982,0.004483455267461182,0.8455882352941176,0.8541666666666667
|
| 18 |
+
result_ablation_mbert_paper,sst2,mBERT,attention,eval_accuracy,3,0.8814984709480123,0.014745643365432668,0.8646788990825688,0.8922018348623854
|
| 19 |
+
result_ablation_mbert_paper,sst2,mBERT,cls,eval_accuracy,3,0.8776758409785933,0.001751749118866907,0.8761467889908257,0.8795871559633027
|
| 20 |
+
result_ablation_mbert_paper,sst2,mBERT,gated_multi_branch,eval_accuracy,3,0.8757645259938838,0.003503498237733826,0.8727064220183486,0.8795871559633027
|
| 21 |
+
result_ablation_mbert_paper,sst2,mBERT,hf_sequence_classifier,eval_accuracy,3,0.878822629969419,0.009753326316646119,0.8692660550458715,0.8887614678899083
|
| 22 |
+
result_ablation_mbert_paper,sst2,mBERT,max,eval_accuracy,3,0.8746177370030581,0.00541951333285852,0.8704128440366973,0.8807339449541285
|
| 23 |
+
result_ablation_mbert_paper,sst2,mBERT,mean,eval_accuracy,3,0.8807339449541284,0.00865806701292519,0.8727064220183486,0.8899082568807339
|
| 24 |
+
result_ablation_mbert_paper,sst2,mBERT,mha_attention,eval_accuracy,3,0.8761467889908258,0.001986296797670735,0.8738532110091743,0.8772935779816514
|
| 25 |
+
result_ablation_mbert_paper,sst2,mBERT,multi_branch_average,eval_accuracy,3,0.8814984709480123,0.01288969059639708,0.8704128440366973,0.8956422018348624
|
| 26 |
+
result_ablation_mbert_paper,vsfc,mBERT,attention,eval_accuracy,1,0.9368288060644346,0.0,0.9368288060644346,0.9368288060644346
|
| 27 |
+
result_ablation_mbert_paper,vsfc,mBERT,cls,eval_accuracy,2,0.9333543903979785,0.0013400633882246875,0.932406822488945,0.934301958307012
|
| 28 |
+
result_ablation_mbert_paper,vsfc,mBERT,gated_multi_branch,eval_accuracy,1,0.9279848389134555,0.0,0.9279848389134555,0.9279848389134555
|
| 29 |
+
result_ablation_mbert_paper,vsfc,mBERT,hf_sequence_classifier,eval_accuracy,2,0.9320909665192672,0.0004466877960748697,0.9317751105495894,0.932406822488945
|
| 30 |
+
result_ablation_mbert_paper,vsfc,mBERT,max,eval_accuracy,2,0.934301958307012,0.0008933755921497394,0.9336702463676564,0.9349336702463676
|
| 31 |
+
result_ablation_mbert_paper,vsfc,mBERT,mean,eval_accuracy,1,0.9330385344283006,0.0,0.9330385344283006,0.9330385344283006
|
| 32 |
+
result_ablation_mbert_paper,vsfc,mBERT,mha_attention,eval_accuracy,3,0.9334596757212045,0.0007294381164746089,0.9330385344283006,0.934301958307012
|
| 33 |
+
result_ablation_mbert_paper,vsfc,mBERT,multi_branch_average,eval_accuracy,1,0.9355653821857233,0.0,0.9355653821857233,0.9355653821857233
|
mbert_paper_metrics/docs/ablation_summary.md
ADDED
|
@@ -0,0 +1,148 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Ablation Result Summary
|
| 2 |
+
|
| 3 |
+
Main metric is selected per task: CoLA uses Matthews correlation; MRPC/QQP/STSB use combined GLUE score when available; classification tasks use accuracy.
|
| 4 |
+
|
| 5 |
+
## Aggregated Results
|
| 6 |
+
| source | task | model | strategy | metric | n | mean | std | min | max |
|
| 7 |
+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
| 8 |
+
| result_ablation_mbert_paper | cola | mBERT | attention | eval_matthews_correlation | 3 | 0.7635 | 0.0155 | 0.7470 | 0.7777 |
|
| 9 |
+
| result_ablation_mbert_paper | cola | mBERT | cls | eval_matthews_correlation | 3 | 0.7581 | 0.0176 | 0.7377 | 0.7682 |
|
| 10 |
+
| result_ablation_mbert_paper | cola | mBERT | gated_multi_branch | eval_matthews_correlation | 3 | 0.7662 | 0.0063 | 0.7612 | 0.7733 |
|
| 11 |
+
| result_ablation_mbert_paper | cola | mBERT | hf_sequence_classifier | eval_matthews_correlation | 3 | 0.7643 | 0.0130 | 0.7493 | 0.7730 |
|
| 12 |
+
| result_ablation_mbert_paper | cola | mBERT | max | eval_matthews_correlation | 3 | 0.7551 | 0.0171 | 0.7355 | 0.7660 |
|
| 13 |
+
| result_ablation_mbert_paper | cola | mBERT | mean | eval_matthews_correlation | 3 | 0.7556 | 0.0237 | 0.7327 | 0.7801 |
|
| 14 |
+
| result_ablation_mbert_paper | cola | mBERT | mha_attention | eval_matthews_correlation | 3 | 0.7684 | 0.0131 | 0.7567 | 0.7825 |
|
| 15 |
+
| result_ablation_mbert_paper | cola | mBERT | multi_branch_average | eval_matthews_correlation | 3 | 0.7684 | 0.0063 | 0.7638 | 0.7755 |
|
| 16 |
+
| result_ablation_mbert_paper | mrpc | mBERT | attention | eval_combined_score | 3 | 0.8477 | 0.0077 | 0.8390 | 0.8535 |
|
| 17 |
+
| result_ablation_mbert_paper | mrpc | mBERT | cls | eval_combined_score | 3 | 0.8393 | 0.0086 | 0.8300 | 0.8470 |
|
| 18 |
+
| result_ablation_mbert_paper | mrpc | mBERT | gated_multi_branch | eval_combined_score | 3 | 0.8475 | 0.0095 | 0.8388 | 0.8575 |
|
| 19 |
+
| result_ablation_mbert_paper | mrpc | mBERT | hf_sequence_classifier | eval_combined_score | 3 | 0.8337 | 0.0105 | 0.8254 | 0.8456 |
|
| 20 |
+
| result_ablation_mbert_paper | mrpc | mBERT | max | eval_combined_score | 3 | 0.8530 | 0.0047 | 0.8481 | 0.8576 |
|
| 21 |
+
| result_ablation_mbert_paper | mrpc | mBERT | mean | eval_combined_score | 3 | 0.8543 | 0.0060 | 0.8496 | 0.8611 |
|
| 22 |
+
| result_ablation_mbert_paper | mrpc | mBERT | mha_attention | eval_combined_score | 3 | 0.8574 | 0.0087 | 0.8476 | 0.8642 |
|
| 23 |
+
| result_ablation_mbert_paper | mrpc | mBERT | multi_branch_average | eval_combined_score | 3 | 0.8506 | 0.0045 | 0.8456 | 0.8542 |
|
| 24 |
+
| result_ablation_mbert_paper | sst2 | mBERT | attention | eval_accuracy | 3 | 0.8815 | 0.0147 | 0.8647 | 0.8922 |
|
| 25 |
+
| result_ablation_mbert_paper | sst2 | mBERT | cls | eval_accuracy | 3 | 0.8777 | 0.0018 | 0.8761 | 0.8796 |
|
| 26 |
+
| result_ablation_mbert_paper | sst2 | mBERT | gated_multi_branch | eval_accuracy | 3 | 0.8758 | 0.0035 | 0.8727 | 0.8796 |
|
| 27 |
+
| result_ablation_mbert_paper | sst2 | mBERT | hf_sequence_classifier | eval_accuracy | 3 | 0.8788 | 0.0098 | 0.8693 | 0.8888 |
|
| 28 |
+
| result_ablation_mbert_paper | sst2 | mBERT | max | eval_accuracy | 3 | 0.8746 | 0.0054 | 0.8704 | 0.8807 |
|
| 29 |
+
| result_ablation_mbert_paper | sst2 | mBERT | mean | eval_accuracy | 3 | 0.8807 | 0.0087 | 0.8727 | 0.8899 |
|
| 30 |
+
| result_ablation_mbert_paper | sst2 | mBERT | mha_attention | eval_accuracy | 3 | 0.8761 | 0.0020 | 0.8739 | 0.8773 |
|
| 31 |
+
| result_ablation_mbert_paper | sst2 | mBERT | multi_branch_average | eval_accuracy | 3 | 0.8815 | 0.0129 | 0.8704 | 0.8956 |
|
| 32 |
+
| result_ablation_mbert_paper | vsfc | mBERT | attention | eval_accuracy | 1 | 0.9368 | 0.0000 | 0.9368 | 0.9368 |
|
| 33 |
+
| result_ablation_mbert_paper | vsfc | mBERT | cls | eval_accuracy | 2 | 0.9334 | 0.0013 | 0.9324 | 0.9343 |
|
| 34 |
+
| result_ablation_mbert_paper | vsfc | mBERT | gated_multi_branch | eval_accuracy | 1 | 0.9280 | 0.0000 | 0.9280 | 0.9280 |
|
| 35 |
+
| result_ablation_mbert_paper | vsfc | mBERT | hf_sequence_classifier | eval_accuracy | 2 | 0.9321 | 0.0004 | 0.9318 | 0.9324 |
|
| 36 |
+
| result_ablation_mbert_paper | vsfc | mBERT | max | eval_accuracy | 2 | 0.9343 | 0.0009 | 0.9337 | 0.9349 |
|
| 37 |
+
| result_ablation_mbert_paper | vsfc | mBERT | mean | eval_accuracy | 1 | 0.9330 | 0.0000 | 0.9330 | 0.9330 |
|
| 38 |
+
| result_ablation_mbert_paper | vsfc | mBERT | mha_attention | eval_accuracy | 3 | 0.9335 | 0.0007 | 0.9330 | 0.9343 |
|
| 39 |
+
| result_ablation_mbert_paper | vsfc | mBERT | multi_branch_average | eval_accuracy | 1 | 0.9356 | 0.0000 | 0.9356 | 0.9356 |
|
| 40 |
+
|
| 41 |
+
## Gated Multi-Branch Deltas
|
| 42 |
+
| source | task | model | baseline | gated_mean | baseline_mean | delta |
|
| 43 |
+
| --- | --- | --- | --- | --- | --- | --- |
|
| 44 |
+
| result_ablation_mbert_paper | cola | mBERT | attention | 0.7662 | 0.7635 | 0.0027 |
|
| 45 |
+
| result_ablation_mbert_paper | cola | mBERT | mha_attention | 0.7662 | 0.7684 | -0.0022 |
|
| 46 |
+
| result_ablation_mbert_paper | cola | mBERT | multi_branch_average | 0.7662 | 0.7684 | -0.0022 |
|
| 47 |
+
| result_ablation_mbert_paper | cola | mBERT | hf_sequence_classifier | 0.7662 | 0.7643 | 0.0019 |
|
| 48 |
+
| result_ablation_mbert_paper | mrpc | mBERT | attention | 0.8475 | 0.8477 | -0.0002 |
|
| 49 |
+
| result_ablation_mbert_paper | mrpc | mBERT | mha_attention | 0.8475 | 0.8574 | -0.0099 |
|
| 50 |
+
| result_ablation_mbert_paper | mrpc | mBERT | multi_branch_average | 0.8475 | 0.8506 | -0.0032 |
|
| 51 |
+
| result_ablation_mbert_paper | mrpc | mBERT | hf_sequence_classifier | 0.8475 | 0.8337 | 0.0137 |
|
| 52 |
+
| result_ablation_mbert_paper | sst2 | mBERT | attention | 0.8758 | 0.8815 | -0.0057 |
|
| 53 |
+
| result_ablation_mbert_paper | sst2 | mBERT | mha_attention | 0.8758 | 0.8761 | -0.0004 |
|
| 54 |
+
| result_ablation_mbert_paper | sst2 | mBERT | multi_branch_average | 0.8758 | 0.8815 | -0.0057 |
|
| 55 |
+
| result_ablation_mbert_paper | sst2 | mBERT | hf_sequence_classifier | 0.8758 | 0.8788 | -0.0031 |
|
| 56 |
+
| result_ablation_mbert_paper | vsfc | mBERT | attention | 0.9280 | 0.9368 | -0.0088 |
|
| 57 |
+
| result_ablation_mbert_paper | vsfc | mBERT | mha_attention | 0.9280 | 0.9335 | -0.0055 |
|
| 58 |
+
| result_ablation_mbert_paper | vsfc | mBERT | multi_branch_average | 0.9280 | 0.9356 | -0.0076 |
|
| 59 |
+
| result_ablation_mbert_paper | vsfc | mBERT | hf_sequence_classifier | 0.9280 | 0.9321 | -0.0041 |
|
| 60 |
+
|
| 61 |
+
## Raw Runs
|
| 62 |
+
| source | task | model | strategy | seed | metric | score | eval_loss | train_loss | epoch | eval_samples | path |
|
| 63 |
+
| --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
| 64 |
+
| result_ablation_mbert_paper | cola | mBERT | hf_sequence_classifier | 43.0000 | eval_matthews_correlation | 0.7706 | 0.2930 | 0.2763 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/hf_sequence_classifier/seed_43/all_results.json |
|
| 65 |
+
| result_ablation_mbert_paper | cola | mBERT | hf_sequence_classifier | 44.0000 | eval_matthews_correlation | 0.7730 | 0.2829 | 0.2657 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/hf_sequence_classifier/seed_44/all_results.json |
|
| 66 |
+
| result_ablation_mbert_paper | cola | mBERT | hf_sequence_classifier | 42.0000 | eval_matthews_correlation | 0.7493 | 0.2916 | 0.2650 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/hf_sequence_classifier/seed_42/all_results.json |
|
| 67 |
+
| result_ablation_mbert_paper | cola | mBERT | cls | 42.0000 | eval_matthews_correlation | 0.7377 | 0.3119 | 0.2640 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/cls/seed_42/all_results.json |
|
| 68 |
+
| result_ablation_mbert_paper | cola | mBERT | cls | 43.0000 | eval_matthews_correlation | 0.7682 | 0.2849 | 0.2700 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/cls/seed_43/all_results.json |
|
| 69 |
+
| result_ablation_mbert_paper | cola | mBERT | cls | 44.0000 | eval_matthews_correlation | 0.7682 | 0.2971 | 0.2545 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/cls/seed_44/all_results.json |
|
| 70 |
+
| result_ablation_mbert_paper | cola | mBERT | mean | 42.0000 | eval_matthews_correlation | 0.7327 | 0.3039 | 0.2534 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/mean/seed_42/all_results.json |
|
| 71 |
+
| result_ablation_mbert_paper | cola | mBERT | mean | 43.0000 | eval_matthews_correlation | 0.7541 | 0.2779 | 0.2586 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/mean/seed_43/all_results.json |
|
| 72 |
+
| result_ablation_mbert_paper | cola | mBERT | mean | 44.0000 | eval_matthews_correlation | 0.7801 | 0.2850 | 0.2490 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/mean/seed_44/all_results.json |
|
| 73 |
+
| result_ablation_mbert_paper | cola | mBERT | max | 42.0000 | eval_matthews_correlation | 0.7355 | 0.3105 | 0.2658 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/max/seed_42/all_results.json |
|
| 74 |
+
| result_ablation_mbert_paper | cola | mBERT | max | 43.0000 | eval_matthews_correlation | 0.7639 | 0.2830 | 0.2683 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/max/seed_43/all_results.json |
|
| 75 |
+
| result_ablation_mbert_paper | cola | mBERT | max | 44.0000 | eval_matthews_correlation | 0.7660 | 0.2872 | 0.2616 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/max/seed_44/all_results.json |
|
| 76 |
+
| result_ablation_mbert_paper | cola | mBERT | attention | 42.0000 | eval_matthews_correlation | 0.7470 | 0.3018 | 0.2488 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/attention/seed_42/all_results.json |
|
| 77 |
+
| result_ablation_mbert_paper | cola | mBERT | attention | 43.0000 | eval_matthews_correlation | 0.7659 | 0.2830 | 0.2619 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/attention/seed_43/all_results.json |
|
| 78 |
+
| result_ablation_mbert_paper | cola | mBERT | attention | 44.0000 | eval_matthews_correlation | 0.7777 | 0.2868 | 0.2515 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/attention/seed_44/all_results.json |
|
| 79 |
+
| result_ablation_mbert_paper | cola | mBERT | mha_attention | 42.0000 | eval_matthews_correlation | 0.7567 | 0.2956 | 0.2555 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/mha_attention/seed_42/all_results.json |
|
| 80 |
+
| result_ablation_mbert_paper | cola | mBERT | mha_attention | 43.0000 | eval_matthews_correlation | 0.7659 | 0.2815 | 0.2596 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/mha_attention/seed_43/all_results.json |
|
| 81 |
+
| result_ablation_mbert_paper | cola | mBERT | mha_attention | 44.0000 | eval_matthews_correlation | 0.7825 | 0.2919 | 0.2563 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/mha_attention/seed_44/all_results.json |
|
| 82 |
+
| result_ablation_mbert_paper | cola | mBERT | multi_branch_average | 42.0000 | eval_matthews_correlation | 0.7659 | 0.3065 | 0.2480 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/multi_branch_average/seed_42/all_results.json |
|
| 83 |
+
| result_ablation_mbert_paper | cola | mBERT | multi_branch_average | 43.0000 | eval_matthews_correlation | 0.7755 | 0.2922 | 0.2530 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/multi_branch_average/seed_43/all_results.json |
|
| 84 |
+
| result_ablation_mbert_paper | cola | mBERT | multi_branch_average | 44.0000 | eval_matthews_correlation | 0.7638 | 0.3023 | 0.2463 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/multi_branch_average/seed_44/all_results.json |
|
| 85 |
+
| result_ablation_mbert_paper | cola | mBERT | gated_multi_branch | 42.0000 | eval_matthews_correlation | 0.7641 | 0.3013 | 0.2518 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/gated_multi_branch/seed_42/all_results.json |
|
| 86 |
+
| result_ablation_mbert_paper | cola | mBERT | gated_multi_branch | 43.0000 | eval_matthews_correlation | 0.7733 | 0.2900 | 0.2569 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/gated_multi_branch/seed_43/all_results.json |
|
| 87 |
+
| result_ablation_mbert_paper | cola | mBERT | gated_multi_branch | 44.0000 | eval_matthews_correlation | 0.7612 | 0.2972 | 0.2462 | 3.0000 | 1043 | /workspace/result_ablation_mbert_paper/cola/mBERT/gated_multi_branch/seed_44/all_results.json |
|
| 88 |
+
| result_ablation_mbert_paper | mrpc | mBERT | hf_sequence_classifier | 42.0000 | eval_combined_score | 0.8254 | 0.4255 | 0.4946 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/hf_sequence_classifier/seed_42/all_results.json |
|
| 89 |
+
| result_ablation_mbert_paper | mrpc | mBERT | hf_sequence_classifier | 43.0000 | eval_combined_score | 0.8456 | 0.3725 | 0.4622 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/hf_sequence_classifier/seed_43/all_results.json |
|
| 90 |
+
| result_ablation_mbert_paper | mrpc | mBERT | hf_sequence_classifier | 44.0000 | eval_combined_score | 0.8303 | 0.3804 | 0.4830 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/hf_sequence_classifier/seed_44/all_results.json |
|
| 91 |
+
| result_ablation_mbert_paper | mrpc | mBERT | cls | 42.0000 | eval_combined_score | 0.8410 | 0.3644 | 0.4560 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/cls/seed_42/all_results.json |
|
| 92 |
+
| result_ablation_mbert_paper | mrpc | mBERT | cls | 43.0000 | eval_combined_score | 0.8470 | 0.3629 | 0.4697 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/cls/seed_43/all_results.json |
|
| 93 |
+
| result_ablation_mbert_paper | mrpc | mBERT | cls | 44.0000 | eval_combined_score | 0.8300 | 0.3965 | 0.4668 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/cls/seed_44/all_results.json |
|
| 94 |
+
| result_ablation_mbert_paper | mrpc | mBERT | mean | 42.0000 | eval_combined_score | 0.8496 | 0.3613 | 0.4559 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/mean/seed_42/all_results.json |
|
| 95 |
+
| result_ablation_mbert_paper | mrpc | mBERT | mean | 43.0000 | eval_combined_score | 0.8611 | 0.3583 | 0.4500 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/mean/seed_43/all_results.json |
|
| 96 |
+
| result_ablation_mbert_paper | mrpc | mBERT | mean | 44.0000 | eval_combined_score | 0.8522 | 0.3472 | 0.4718 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/mean/seed_44/all_results.json |
|
| 97 |
+
| result_ablation_mbert_paper | mrpc | mBERT | max | 42.0000 | eval_combined_score | 0.8576 | 0.3260 | 0.4347 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/max/seed_42/all_results.json |
|
| 98 |
+
| result_ablation_mbert_paper | mrpc | mBERT | max | 43.0000 | eval_combined_score | 0.8481 | 0.3481 | 0.4461 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/max/seed_43/all_results.json |
|
| 99 |
+
| result_ablation_mbert_paper | mrpc | mBERT | max | 44.0000 | eval_combined_score | 0.8535 | 0.3395 | 0.4507 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/max/seed_44/all_results.json |
|
| 100 |
+
| result_ablation_mbert_paper | mrpc | mBERT | attention | 42.0000 | eval_combined_score | 0.8390 | 0.3504 | 0.4567 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/attention/seed_42/all_results.json |
|
| 101 |
+
| result_ablation_mbert_paper | mrpc | mBERT | attention | 43.0000 | eval_combined_score | 0.8506 | 0.3571 | 0.4534 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/attention/seed_43/all_results.json |
|
| 102 |
+
| result_ablation_mbert_paper | mrpc | mBERT | attention | 44.0000 | eval_combined_score | 0.8535 | 0.3527 | 0.4397 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/attention/seed_44/all_results.json |
|
| 103 |
+
| result_ablation_mbert_paper | mrpc | mBERT | mha_attention | 42.0000 | eval_combined_score | 0.8476 | 0.3495 | 0.4544 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/mha_attention/seed_42/all_results.json |
|
| 104 |
+
| result_ablation_mbert_paper | mrpc | mBERT | mha_attention | 43.0000 | eval_combined_score | 0.8642 | 0.3536 | 0.4513 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/mha_attention/seed_43/all_results.json |
|
| 105 |
+
| result_ablation_mbert_paper | mrpc | mBERT | mha_attention | 44.0000 | eval_combined_score | 0.8603 | 0.3448 | 0.4316 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/mha_attention/seed_44/all_results.json |
|
| 106 |
+
| result_ablation_mbert_paper | mrpc | mBERT | multi_branch_average | 42.0000 | eval_combined_score | 0.8542 | 0.3541 | 0.4266 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/multi_branch_average/seed_42/all_results.json |
|
| 107 |
+
| result_ablation_mbert_paper | mrpc | mBERT | multi_branch_average | 43.0000 | eval_combined_score | 0.8521 | 0.3448 | 0.4470 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/multi_branch_average/seed_43/all_results.json |
|
| 108 |
+
| result_ablation_mbert_paper | mrpc | mBERT | multi_branch_average | 44.0000 | eval_combined_score | 0.8456 | 0.3684 | 0.4538 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/multi_branch_average/seed_44/all_results.json |
|
| 109 |
+
| result_ablation_mbert_paper | mrpc | mBERT | gated_multi_branch | 42.0000 | eval_combined_score | 0.8461 | 0.3519 | 0.4314 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/gated_multi_branch/seed_42/all_results.json |
|
| 110 |
+
| result_ablation_mbert_paper | mrpc | mBERT | gated_multi_branch | 43.0000 | eval_combined_score | 0.8388 | 0.3762 | 0.4740 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/gated_multi_branch/seed_43/all_results.json |
|
| 111 |
+
| result_ablation_mbert_paper | mrpc | mBERT | gated_multi_branch | 44.0000 | eval_combined_score | 0.8575 | 0.3612 | 0.4626 | 3.0000 | 408 | /workspace/result_ablation_mbert_paper/mrpc/mBERT/gated_multi_branch/seed_44/all_results.json |
|
| 112 |
+
| result_ablation_mbert_paper | sst2 | mBERT | hf_sequence_classifier | 42.0000 | eval_accuracy | 0.8693 | 0.4158 | 0.2582 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/hf_sequence_classifier/seed_42/all_results.json |
|
| 113 |
+
| result_ablation_mbert_paper | sst2 | mBERT | hf_sequence_classifier | 43.0000 | eval_accuracy | 0.8784 | 0.3953 | 0.2572 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/hf_sequence_classifier/seed_43/all_results.json |
|
| 114 |
+
| result_ablation_mbert_paper | sst2 | mBERT | hf_sequence_classifier | 44.0000 | eval_accuracy | 0.8888 | 0.3691 | 0.2579 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/hf_sequence_classifier/seed_44/all_results.json |
|
| 115 |
+
| result_ablation_mbert_paper | sst2 | mBERT | cls | 42.0000 | eval_accuracy | 0.8761 | 0.3886 | 0.2583 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/cls/seed_42/all_results.json |
|
| 116 |
+
| result_ablation_mbert_paper | sst2 | mBERT | cls | 43.0000 | eval_accuracy | 0.8796 | 0.4070 | 0.2557 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/cls/seed_43/all_results.json |
|
| 117 |
+
| result_ablation_mbert_paper | sst2 | mBERT | cls | 44.0000 | eval_accuracy | 0.8773 | 0.3818 | 0.2585 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/cls/seed_44/all_results.json |
|
| 118 |
+
| result_ablation_mbert_paper | sst2 | mBERT | mean | 42.0000 | eval_accuracy | 0.8727 | 0.4060 | 0.2542 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/mean/seed_42/all_results.json |
|
| 119 |
+
| result_ablation_mbert_paper | sst2 | mBERT | mean | 43.0000 | eval_accuracy | 0.8899 | 0.3604 | 0.2507 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/mean/seed_43/all_results.json |
|
| 120 |
+
| result_ablation_mbert_paper | sst2 | mBERT | mean | 44.0000 | eval_accuracy | 0.8796 | 0.3606 | 0.2528 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/mean/seed_44/all_results.json |
|
| 121 |
+
| result_ablation_mbert_paper | sst2 | mBERT | max | 42.0000 | eval_accuracy | 0.8704 | 0.3956 | 0.2599 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/max/seed_42/all_results.json |
|
| 122 |
+
| result_ablation_mbert_paper | sst2 | mBERT | max | 43.0000 | eval_accuracy | 0.8807 | 0.3866 | 0.2557 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/max/seed_43/all_results.json |
|
| 123 |
+
| result_ablation_mbert_paper | sst2 | mBERT | max | 44.0000 | eval_accuracy | 0.8727 | 0.3852 | 0.2583 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/max/seed_44/all_results.json |
|
| 124 |
+
| result_ablation_mbert_paper | sst2 | mBERT | attention | 42.0000 | eval_accuracy | 0.8647 | 0.3977 | 0.2543 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/attention/seed_42/all_results.json |
|
| 125 |
+
| result_ablation_mbert_paper | sst2 | mBERT | attention | 43.0000 | eval_accuracy | 0.8876 | 0.3642 | 0.2510 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/attention/seed_43/all_results.json |
|
| 126 |
+
| result_ablation_mbert_paper | sst2 | mBERT | attention | 44.0000 | eval_accuracy | 0.8922 | 0.3617 | 0.2530 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/attention/seed_44/all_results.json |
|
| 127 |
+
| result_ablation_mbert_paper | sst2 | mBERT | mha_attention | 42.0000 | eval_accuracy | 0.8739 | 0.3983 | 0.2552 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/mha_attention/seed_42/all_results.json |
|
| 128 |
+
| result_ablation_mbert_paper | sst2 | mBERT | mha_attention | 43.0000 | eval_accuracy | 0.8773 | 0.3742 | 0.2544 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/mha_attention/seed_43/all_results.json |
|
| 129 |
+
| result_ablation_mbert_paper | sst2 | mBERT | mha_attention | 44.0000 | eval_accuracy | 0.8773 | 0.3714 | 0.2535 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/mha_attention/seed_44/all_results.json |
|
| 130 |
+
| result_ablation_mbert_paper | sst2 | mBERT | multi_branch_average | 42.0000 | eval_accuracy | 0.8704 | 0.3956 | 0.2518 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/multi_branch_average/seed_42/all_results.json |
|
| 131 |
+
| result_ablation_mbert_paper | sst2 | mBERT | multi_branch_average | 43.0000 | eval_accuracy | 0.8956 | 0.3478 | 0.2549 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/multi_branch_average/seed_43/all_results.json |
|
| 132 |
+
| result_ablation_mbert_paper | sst2 | mBERT | multi_branch_average | 44.0000 | eval_accuracy | 0.8784 | 0.3733 | 0.2547 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/multi_branch_average/seed_44/all_results.json |
|
| 133 |
+
| result_ablation_mbert_paper | sst2 | mBERT | gated_multi_branch | 42.0000 | eval_accuracy | 0.8727 | 0.3894 | 0.2527 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/gated_multi_branch/seed_42/all_results.json |
|
| 134 |
+
| result_ablation_mbert_paper | sst2 | mBERT | gated_multi_branch | 43.0000 | eval_accuracy | 0.8796 | 0.3644 | 0.2535 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/gated_multi_branch/seed_43/all_results.json |
|
| 135 |
+
| result_ablation_mbert_paper | sst2 | mBERT | gated_multi_branch | 44.0000 | eval_accuracy | 0.8750 | 0.3715 | 0.2535 | 3.0000 | 872 | /workspace/result_ablation_mbert_paper/sst2/mBERT/gated_multi_branch/seed_44/all_results.json |
|
| 136 |
+
| result_ablation_mbert_paper | vsfc | mBERT | hf_sequence_classifier | 42.0000 | eval_accuracy | 0.9324 | 0.2212 | 0.2344 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/hf_sequence_classifier/seed_42/all_results.json |
|
| 137 |
+
| result_ablation_mbert_paper | vsfc | mBERT | hf_sequence_classifier | 43.0000 | eval_accuracy | 0.9318 | 0.2235 | 0.2348 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/hf_sequence_classifier/seed_43/all_results.json |
|
| 138 |
+
| result_ablation_mbert_paper | vsfc | mBERT | cls | 42.0000 | eval_accuracy | 0.9324 | 0.2179 | 0.2280 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/cls/seed_42/all_results.json |
|
| 139 |
+
| result_ablation_mbert_paper | vsfc | mBERT | cls | 44.0000 | eval_accuracy | 0.9343 | 0.2194 | 0.2327 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/cls/seed_44/all_results.json |
|
| 140 |
+
| result_ablation_mbert_paper | vsfc | mBERT | mean | 42.0000 | eval_accuracy | 0.9330 | 0.2111 | 0.2192 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/mean/seed_42/all_results.json |
|
| 141 |
+
| result_ablation_mbert_paper | vsfc | mBERT | max | 42.0000 | eval_accuracy | 0.9337 | 0.2268 | 0.2342 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/max/seed_42/all_results.json |
|
| 142 |
+
| result_ablation_mbert_paper | vsfc | mBERT | max | 43.0000 | eval_accuracy | 0.9349 | 0.2257 | 0.2350 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/max/seed_43/all_results.json |
|
| 143 |
+
| result_ablation_mbert_paper | vsfc | mBERT | attention | 44.0000 | eval_accuracy | 0.9368 | 0.2090 | 0.2230 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/attention/seed_44/all_results.json |
|
| 144 |
+
| result_ablation_mbert_paper | vsfc | mBERT | mha_attention | 42.0000 | eval_accuracy | 0.9330 | 0.2153 | 0.2222 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/mha_attention/seed_42/all_results.json |
|
| 145 |
+
| result_ablation_mbert_paper | vsfc | mBERT | mha_attention | 43.0000 | eval_accuracy | 0.9330 | 0.2182 | 0.2247 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/mha_attention/seed_43/all_results.json |
|
| 146 |
+
| result_ablation_mbert_paper | vsfc | mBERT | mha_attention | 44.0000 | eval_accuracy | 0.9343 | 0.2153 | 0.2245 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/mha_attention/seed_44/all_results.json |
|
| 147 |
+
| result_ablation_mbert_paper | vsfc | mBERT | multi_branch_average | 42.0000 | eval_accuracy | 0.9356 | 0.2142 | 0.2183 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/multi_branch_average/seed_42/all_results.json |
|
| 148 |
+
| result_ablation_mbert_paper | vsfc | mBERT | gated_multi_branch | 44.0000 | eval_accuracy | 0.9280 | 0.2195 | 0.2245 | 3.0000 | 1583 | /workspace/result_ablation_mbert_paper/vsfc/mBERT/gated_multi_branch/seed_44/all_results.json |
|
mbert_paper_metrics/docs/reviewer_experiment_plan.md
ADDED
|
@@ -0,0 +1,110 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Reviewer-Focused Additional Experiments
|
| 2 |
+
|
| 3 |
+
Muc tieu cua bo thi nghiem bo sung la tra loi truc tiep ba nhan xet lap lai trong review:
|
| 4 |
+
|
| 5 |
+
1. Lam ro mo hinh de xuat la mot pooling/classification head dat tren PLM, khong phai thay the cac MHA block ben trong Transformer.
|
| 6 |
+
2. Chung minh loi ich den tu gate, khong chi den tu viec them attention pooling/head phuc tap hon.
|
| 7 |
+
3. Bao cao do on dinh bang nhieu seed va do lech chuan.
|
| 8 |
+
|
| 9 |
+
## Ablation Can Chay
|
| 10 |
+
|
| 11 |
+
Chay cung backbone, cung split, cung hyper-parameter, khac duy nhat o `--pooling_strategy`.
|
| 12 |
+
|
| 13 |
+
| Strategy | Vai tro trong paper |
|
| 14 |
+
| --- | --- |
|
| 15 |
+
| `hf_sequence_classifier` | Baseline fine-tuning chuan cua HuggingFace/PLM. |
|
| 16 |
+
| `cls` | Baseline pooled representation tu token dau tien voi cung MLP classifier. |
|
| 17 |
+
| `mean` | Masked mean pooling baseline. |
|
| 18 |
+
| `max` | Masked max pooling baseline. |
|
| 19 |
+
| `attention` | Standard attention pooling khong gate. Day la baseline reviewer yeu cau ro nhat. |
|
| 20 |
+
| `mha_attention` | Mot lop MHA + attention pooling, khong multi-branch va khong gate. |
|
| 21 |
+
| `multi_branch_average` | Cung 3 MHA branch nhu de xuat nhung tron deu, dung de tach loi ich cua gate khoi loi ich tang tham so/branch. |
|
| 22 |
+
| `gated_multi_branch` | Phuong phap de xuat. |
|
| 23 |
+
|
| 24 |
+
## Lenh Chay Khuyen Nghi Tren m-gpux
|
| 25 |
+
|
| 26 |
+
Moi truong m-gpux/Modal hien co the dung Python 3.9, nen `requirements.txt` da duoc de o dang toi gian va tuong thich Python 3.9. Cai dependencies bang:
|
| 27 |
+
|
| 28 |
+
```bash
|
| 29 |
+
python -m pip install -r requirements.txt
|
| 30 |
+
```
|
| 31 |
+
|
| 32 |
+
Test nhanh luong truoc khi chay that:
|
| 33 |
+
|
| 34 |
+
```bash
|
| 35 |
+
python scripts/run_ablation_grid.py \
|
| 36 |
+
--models PhoBERT \
|
| 37 |
+
--tasks cola \
|
| 38 |
+
--strategies hf_sequence_classifier attention gated_multi_branch \
|
| 39 |
+
--seeds 42 \
|
| 40 |
+
--limit 32 \
|
| 41 |
+
--max_runs 3
|
| 42 |
+
```
|
| 43 |
+
|
| 44 |
+
Khong chay truc tiep `run_glue.py` cho bo ablation nay, vi script runner moi se tu goi `run_glue_MHA_gated.py` voi day du tham so cho tung baseline.
|
| 45 |
+
|
| 46 |
+
Neu m-gpux UI chi cho chon file va tu goi `python run_glue_MHA_gated.py` khong kem tham so, file nay da co che do no-arg launcher. Mac dinh no chay preset `core`. Co the dieu khien bang environment variables:
|
| 47 |
+
|
| 48 |
+
```bash
|
| 49 |
+
ABLATION_PRESET=smoke ABLATION_DRY_RUN=1 python run_glue_MHA_gated.py
|
| 50 |
+
ABLATION_PRESET=smoke python run_glue_MHA_gated.py
|
| 51 |
+
ABLATION_PRESET=core python run_glue_MHA_gated.py
|
| 52 |
+
ABLATION_PRESET=full python run_glue_MHA_gated.py
|
| 53 |
+
```
|
| 54 |
+
|
| 55 |
+
Bien hay dung: `ABLATION_LIMIT`, `ABLATION_MAX_RUNS`, `ABLATION_MODELS`, `ABLATION_TASKS`, `ABLATION_STRATEGIES`, `ABLATION_SEEDS`.
|
| 56 |
+
|
| 57 |
+
Lenh full ablation cho reviewer, mac dinh bat `bf16` va `tf32`:
|
| 58 |
+
|
| 59 |
+
```bash
|
| 60 |
+
python scripts/run_ablation_grid.py \
|
| 61 |
+
--models all \
|
| 62 |
+
--tasks cola mrpc sst2 vnrte vsfc vsmec vtoc qqp \
|
| 63 |
+
--strategies hf_sequence_classifier cls mean max attention mha_attention multi_branch_average gated_multi_branch \
|
| 64 |
+
--seeds 42 43 44 \
|
| 65 |
+
--output_root result_ablation \
|
| 66 |
+
--epochs 3 \
|
| 67 |
+
--train_batch_size 32 \
|
| 68 |
+
--eval_batch_size 64
|
| 69 |
+
```
|
| 70 |
+
|
| 71 |
+
Neu muon chay gon hon nhung van tra loi dung reviewer, chay cac baseline cot loi:
|
| 72 |
+
|
| 73 |
+
```bash
|
| 74 |
+
python scripts/run_ablation_grid.py \
|
| 75 |
+
--models PhoBERT mDeBERTaV3 XLMR_base \
|
| 76 |
+
--tasks cola mrpc sst2 vnrte vsfc \
|
| 77 |
+
--strategies hf_sequence_classifier attention multi_branch_average gated_multi_branch \
|
| 78 |
+
--seeds 42 43 44
|
| 79 |
+
```
|
| 80 |
+
|
| 81 |
+
Sau khi chay xong, tao bang tong hop:
|
| 82 |
+
|
| 83 |
+
```bash
|
| 84 |
+
python scripts/summarize_results.py --roots result_MHA result_ablation
|
| 85 |
+
```
|
| 86 |
+
|
| 87 |
+
File can lay so lieu:
|
| 88 |
+
|
| 89 |
+
- `docs/ablation_summary.md`
|
| 90 |
+
- `docs/ablation_results.csv`
|
| 91 |
+
- `docs/ablation_results_aggregate.csv`
|
| 92 |
+
|
| 93 |
+
## Cach Dua Vao Paper
|
| 94 |
+
|
| 95 |
+
Ten goi nen sua thanh `Gated Multi-Branch Attention Pooling` hoac `GMAP` thay vi noi nhu the da thay cac MHA block ben trong PLM. Mo ta kien truc:
|
| 96 |
+
|
| 97 |
+
> We keep the pretrained Transformer backbone unchanged and attach a lightweight gated multi-branch attention pooling head on top of the final hidden states. The proposed gate dynamically combines representations produced by multiple attention branches with different head granularities.
|
| 98 |
+
|
| 99 |
+
Bang ablation nen co cac cot:
|
| 100 |
+
|
| 101 |
+
| Task | Backbone | CLS | Mean | Max | Attn pooling | MHA+Attn | Multi-branch avg | Ours |
|
| 102 |
+
| --- | --- | --- | --- | --- | --- | --- | --- | --- |
|
| 103 |
+
|
| 104 |
+
Trong response rebuttal, neu ket qua ung ho, viet ngan gon:
|
| 105 |
+
|
| 106 |
+
> To isolate the contribution of the gate, we added standard attention pooling and ungated multi-branch pooling baselines under the same PLM backbone and hyper-parameters. The gated variant consistently improves over attention pooling and over the ungated multi-branch average on tasks where our method previously showed the largest gains, indicating that the improvement is not merely due to adding a larger pooling head.
|
| 107 |
+
|
| 108 |
+
Neu co task giam, nen noi thang:
|
| 109 |
+
|
| 110 |
+
> The ablation also confirms that the gate is less robust on NLI-style reasoning tasks, where excessive suppression of branch-specific signals can hurt entailment decisions. We now discuss this as a limitation and avoid overclaiming language-specific universality.
|