[2026-05-04 20:55:43] [ANCHOR_MISSING] apps_introductory [2026-05-04 20:55:43] [ANCHOR_MISSING] codecontests_easy [2026-05-04 20:55:43] Available anchors: 22 counts={'math': 8, 'code': 6, 'science': 8} [2026-05-04 20:55:43] [EXP1] Building/evaluating main mapping table [2026-05-04 20:55:44] [EXP1_TASK] gsm_hard /usr/local/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:590: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. warnings.warn( /usr/local/lib/python3.11/site-packages/transformers/generation/configuration_utils.py:595: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. warnings.warn( [2026-05-04 21:00:18] [EXP1_DONE] gsm_hard: {'Domain': 'math', 'Task': 'gsm_hard', 'base_Y': 0.056666666666666664, 'mean': 0.06333333333333334, 'global_ridge': 0.05333333333333334, 'pertensor_ridge': 0.056666666666666664, 'topk8_global_ridge': 0.05, 'topk8_pertensor_ridge': 0.04666666666666667, 'pertensor_mlp': 0.06666666666666667, 'oracle': 0.07333333333333333, 'gap_recovered': 0.6} [2026-05-04 21:00:18] [EXP1_TASK] math_algebra_medium [2026-05-04 21:03:33] [EXP1_DONE] math_algebra_medium: {'Domain': 'math', 'Task': 'math_algebra_medium', 'base_Y': 0.09333333333333334, 'mean': 0.1, 'global_ridge': 0.09333333333333334, 'pertensor_ridge': 0.1, 'topk8_global_ridge': 0.10333333333333333, 'topk8_pertensor_ridge': 0.10333333333333333, 'pertensor_mlp': 0.09333333333333334, 'oracle': 0.09666666666666666, 'gap_recovered': 3.000000000000004} [2026-05-04 21:03:33] [EXP1_TASK] humaneval_plus [2026-05-04 21:12:11] [EXP1_DONE] humaneval_plus: {'Domain': 'code', 'Task': 'humaneval_plus', 'base_Y': 0.07926829268292683, 'mean': 0.08536585365853659, 'global_ridge': 0.06707317073170732, 'pertensor_ridge': 0.06707317073170732, 'topk8_global_ridge': 0.06707317073170732, 'topk8_pertensor_ridge': 0.06707317073170732, 'pertensor_mlp': 0.07317073170731707, 'oracle': 0.06707317073170732, 'gap_recovered': -0.5000000000000006} [2026-05-04 21:12:11] [EXP1_TASK] mbpp_plus [2026-05-04 21:25:00] [EXP1_DONE] mbpp_plus: {'Domain': 'code', 'Task': 'mbpp_plus', 'base_Y': 0.21666666666666667, 'mean': 0.20666666666666667, 'global_ridge': 0.21666666666666667, 'pertensor_ridge': 0.21, 'topk8_global_ridge': 0.21333333333333335, 'topk8_pertensor_ridge': 0.20333333333333334, 'pertensor_mlp': 0.2, 'oracle': 0.22, 'gap_recovered': 0.0} [2026-05-04 21:25:00] [EXP1_TASK] arc_challenge [2026-05-04 21:28:07] [EXP1_DONE] arc_challenge: {'Domain': 'science', 'Task': 'arc_challenge', 'base_Y': 0.705685618729097, 'mean': 0.7324414715719063, 'global_ridge': 0.705685618729097, 'pertensor_ridge': 0.705685618729097, 'topk8_global_ridge': 0.705685618729097, 'topk8_pertensor_ridge': 0.705685618729097, 'pertensor_mlp': 0.725752508361204, 'oracle': 0.725752508361204, 'gap_recovered': 1.3333333333333333} [2026-05-04 21:28:07] [EXP1_TASK] mmlu_college_chemistry [2026-05-04 21:30:00] [EXP1_DONE] mmlu_college_chemistry: {'Domain': 'science', 'Task': 'mmlu_college_chemistry', 'base_Y': 0.375, 'mean': 0.375, 'global_ridge': 0.375, 'pertensor_ridge': 0.375, 'topk8_global_ridge': 0.375, 'topk8_pertensor_ridge': 0.375, 'pertensor_mlp': 0.25, 'oracle': 0.375, 'gap_recovered': None} [2026-05-04 21:30:00] [EXP2] Anchor-count + Top-K scaling [2026-05-04 21:33:19] [EXP2] N5_global_ridge: {'math': -0.3999999999999998, 'code': 0.5000000000000002, 'science': 1.1666666666666667} [2026-05-04 21:36:45] [EXP2] N12_global_ridge: {'math': 0.3000000000000002, 'code': -0.5000000000000042, 'science': 0.5} [2026-05-04 21:40:13] [EXP2] N12_topk8_global_ridge: {'math': 1.8000000000000043, 'code': -0.5000000000000042, 'science': 0.3333333333333333} [2026-05-04 21:43:41] [EXP2] N12_topk12_global_ridge: {'math': -0.29999999999999977, 'code': -0.5000000000000042, 'science': 0.3333333333333333} [2026-05-04 21:43:41] [EXP2] N22_global_ridge: {'math': -0.0999999999999998, 'code': 0.5, 'science': 0.0} [2026-05-04 21:43:41] [EXP2] N22_topk8_global_ridge: {'math': 1.3000000000000023, 'code': 0.0, 'science': 0.0} [2026-05-04 21:47:16] [EXP2] N22_topk12_global_ridge: {'math': 0.20000000000000023, 'code': 0.5, 'science': 0.5} [2026-05-04 21:47:16] [EXP3] Cross-domain transfer heatmap [2026-05-04 21:47:36] [EXP3] math-only -> gsm_hard: acc=0.0500 gap=-0.3999999999999996 [2026-05-04 21:47:53] [EXP3] math-only -> math_algebra_medium: acc=0.1000 gap=2.000000000000004 [2026-05-04 21:48:47] [EXP3] math-only -> humaneval_plus: acc=0.0732 gap=0.5000000000000006 [2026-05-04 21:50:01] [EXP3] math-only -> mbpp_plus: acc=0.2200 gap=1.0 [2026-05-04 21:50:16] [EXP3] math-only -> arc_challenge: acc=0.7157 gap=0.5 [2026-05-04 21:50:27] [EXP3] math-only -> mmlu_college_chemistry: acc=0.2500 gap=None [2026-05-04 21:50:49] [EXP3] code-only -> gsm_hard: acc=0.0767 gap=1.1999999999999995 [2026-05-04 21:51:11] [EXP3] code-only -> math_algebra_medium: acc=0.1200 gap=8.000000000000012 [2026-05-04 21:52:02] [EXP3] code-only -> humaneval_plus: acc=0.0671 gap=1.0 [2026-05-04 21:53:21] [EXP3] code-only -> mbpp_plus: acc=0.2067 gap=-3.0000000000000084 [2026-05-04 21:53:34] [EXP3] code-only -> arc_challenge: acc=0.7291 gap=1.1666666666666667 [2026-05-04 21:53:45] [EXP3] code-only -> mmlu_college_chemistry: acc=0.3750 gap=None [2026-05-04 21:54:07] [EXP3] science-only -> gsm_hard: acc=0.0633 gap=0.4000000000000004 [2026-05-04 21:54:29] [EXP3] science-only -> math_algebra_medium: acc=0.1133 gap=6.000000000000008 [2026-05-04 21:55:25] [EXP3] science-only -> humaneval_plus: acc=0.0732 gap=0.5000000000000006 [2026-05-04 21:56:38] [EXP3] science-only -> mbpp_plus: acc=0.2133 gap=-1.0 [2026-05-04 21:57:00] [EXP3] science-only -> arc_challenge: acc=0.7157 gap=0.5 [2026-05-04 21:57:11] [EXP3] science-only -> mmlu_college_chemistry: acc=0.3750 gap=None [2026-05-04 21:57:31] [EXP3] math+code -> gsm_hard: acc=0.0500 gap=-0.3999999999999996 [2026-05-04 21:57:50] [EXP3] math+code -> math_algebra_medium: acc=0.0933 gap=0.0 [2026-05-04 21:58:41] [EXP3] math+code -> humaneval_plus: acc=0.0671 gap=1.0 [2026-05-04 22:00:09] [EXP3] math+code -> mbpp_plus: acc=0.2100 gap=-2.0000000000000084 [2026-05-04 22:00:25] [EXP3] math+code -> arc_challenge: acc=0.7124 gap=0.3333333333333333 [2026-05-04 22:00:36] [EXP3] math+code -> mmlu_college_chemistry: acc=0.3750 gap=None [2026-05-04 22:00:56] [EXP3] all -> gsm_hard: acc=0.0500 gap=-0.3999999999999996 [2026-05-04 22:01:15] [EXP3] all -> math_algebra_medium: acc=0.1033 gap=3.000000000000004 [2026-05-04 22:02:07] [EXP3] all -> humaneval_plus: acc=0.0671 gap=1.0 [2026-05-04 22:03:36] [EXP3] all -> mbpp_plus: acc=0.2133 gap=-1.0 [2026-05-04 22:03:57] [EXP3] all -> arc_challenge: acc=0.7057 gap=0.0 [2026-05-04 22:04:08] [EXP3] all -> mmlu_college_chemistry: acc=0.3750 gap=None