File size: 5,943 Bytes
d825e06 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 | # Seven Setting Eval Tables
Last updated: 2026-04-13 UTC
Notes:
- `pass@1` is taken from `accuracy/mean`.
- `combined` is only defined from `pass@4` onward, so `pass@1` and `pass@2` are left blank.
- Blank cells mean the number is not available yet or I intentionally left it blank because the desired eval is still pending.
- For training runs, I pulled the metric from W&B history at the requested step (for example `_step=400` or `_step=1000`).
## Setting 1
Qwen2.5-0.5B-Instruct, GSM8K train `2000` step, GSM8K eval.
| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | m3ocmw3l | 512 | shared baseline |
| 1-LoRA | s4bxcc1l | 512 | resume global_step_2000 |
| 4-LoRA Single/Combined | rk9ic9kk | 2048 | resume global_step_2000 |
| MERL Single/Combined | (pending) | 2048 | eval run not finished yet; left blank |
| k | Base | 1-LoRA | 4-LoRA Single | 4-LoRA Combined | MERL Single | MERL Combined |
| --- | --- | --- | --- | --- | --- | --- |
| 1 | 0.4661 | 0.6264 | 0.6151 | | | |
| 2 | 0.5943 | 0.6898 | 0.6960 | | | |
| 4 | 0.7012 | 0.7442 | 0.7594 | 0.8048 | | |
| 8 | 0.7878 | 0.7915 | 0.8118 | 0.8568 | | |
| 16 | 0.8560 | 0.8318 | 0.8544 | 0.8963 | | |
| 32 | 0.9065 | 0.8663 | 0.8885 | 0.9252 | | |
| 64 | 0.9417 | 0.8953 | 0.9157 | 0.9463 | | |
| 128 | 0.9651 | 0.9176 | 0.9365 | 0.9626 | | |
| 256 | 0.9799 | 0.9350 | 0.9516 | 0.9751 | | |
| 512 | 0.9909 | 0.9487 | 0.9622 | 0.9838 | | |
## Setting 2
Qwen2.5-0.5B-Instruct, GSM8K train `200` step, GSM8K eval.
| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | m3ocmw3l | 512 | shared baseline |
| 1-LoRA | xw4w9c0u | 512 | resume global_step_200 |
| 4-LoRA Single/Combined | 2rytl841 | 2048 | resume global_step_200 |
| MERL Single/Combined | 0041qzrm | 2048 | resume global_step_200 |
| k | Base | 1-LoRA | 4-LoRA Single | 4-LoRA Combined | MERL Single | MERL Combined |
| --- | --- | --- | --- | --- | --- | --- |
| 1 | 0.4661 | 0.5942 | 0.5703 | | 0.5335 | |
| 2 | 0.5943 | 0.6842 | 0.6656 | | 0.6450 | |
| 4 | 0.7012 | 0.7557 | 0.7438 | 0.7772 | 0.7374 | 0.7584 |
| 8 | 0.7878 | 0.8125 | 0.8069 | 0.8389 | 0.8116 | 0.8308 |
| 16 | 0.8560 | 0.8590 | 0.8572 | 0.8871 | 0.8694 | 0.8861 |
| 32 | 0.9065 | 0.8978 | 0.8969 | 0.9237 | 0.9127 | 0.9266 |
| 64 | 0.9417 | 0.9285 | 0.9271 | 0.9503 | 0.9437 | 0.9544 |
| 128 | 0.9651 | 0.9497 | 0.9491 | 0.9682 | 0.9646 | 0.9723 |
| 256 | 0.9799 | 0.9636 | 0.9647 | 0.9795 | 0.9785 | 0.9841 |
| 512 | 0.9909 | 0.9727 | 0.9754 | 0.9870 | 0.9880 | 0.9920 |
## Setting 3
Qwen3-0.6B-Base, MATH train `400` step, Math eval.
| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | 1eidnqtd | 512 | base eval on Math500 |
| Single Avg | (pending) | 2048 | new eval launched in tmux 0:0; left blank for now |
| Combined | (pending) | 2048 | new eval launched in tmux 0:0; left blank for now |
| k | Base | Single Avg | Combined |
| --- | --- | --- | --- |
| 1 | 0.2154 | | |
| 2 | 0.3370 | | |
| 4 | 0.4754 | | |
| 8 | 0.6065 | | |
| 16 | 0.7143 | | |
| 32 | 0.7946 | | |
| 64 | 0.8513 | | |
| 128 | 0.8916 | | |
| 256 | 0.9207 | | |
| 512 | 0.9416 | | |
## Setting 4
Qwen2.5-0.5B-Instruct, MATH train `400` step, Math eval.
| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | ub2ua0fb | 512 | base eval on Math500 |
| Single Avg | bfgx3ra4 | 2048 | resume global_step_400 |
| Combined | bfgx3ra4 | 2048 | resume global_step_400 |
| k | Base | Single Avg | Combined |
| --- | --- | --- | --- |
| 1 | 0.3081 | 0.3568 | |
| 2 | 0.4144 | 0.4484 | |
| 4 | 0.5162 | 0.5351 | 0.5514 |
| 8 | 0.6078 | 0.6140 | 0.6305 |
| 16 | 0.6890 | 0.6847 | 0.7014 |
| 32 | 0.7598 | 0.7463 | 0.7634 |
| 64 | 0.8180 | 0.7977 | 0.8141 |
| 128 | 0.8627 | 0.8398 | 0.8549 |
| 256 | 0.8956 | 0.8750 | 0.8883 |
| 512 | 0.9195 | 0.9054 | 0.9147 |
## Setting 5
SmolLM2-360M-Instruct, GSM8K train `1000` step, GSM8K eval.
| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | (not found) | | no standalone base eval run found |
| Single Avg | uw2s3olq @ _step=1000 | 2048 | training-run history |
| Combined | uw2s3olq @ _step=1000 | 2048 | training-run history |
| k | Base | Single Avg | Combined |
| --- | --- | --- | --- |
| 1 | | 0.2237 | |
| 2 | | 0.2939 | |
| 4 | | 0.3664 | 0.4218 |
| 8 | | 0.4397 | 0.5067 |
| 16 | | 0.5130 | 0.5902 |
| 32 | | 0.5850 | 0.6704 |
| 64 | | 0.6530 | 0.7439 |
| 128 | | 0.7147 | 0.8064 |
| 256 | | 0.7692 | 0.8564 |
| 512 | | 0.8166 | 0.8968 |
## Setting 6
SmolLM2-360M-Instruct, GSM8K train `200` step, GSM8K eval.
| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | (not found) | | no standalone base eval run found |
| Single Avg | zv5xbryh | 2048 | resume global_step_200 |
| Combined | zv5xbryh | 2048 | resume global_step_200 |
| k | Base | Single Avg | Combined |
| --- | --- | --- | --- |
| 1 | | 0.1588 | |
| 2 | | 0.2213 | |
| 4 | | 0.2925 | 0.3359 |
| 8 | | 0.3718 | 0.4268 |
| 16 | | 0.4564 | 0.5222 |
| 32 | | 0.5410 | 0.6159 |
| 64 | | 0.6196 | 0.7016 |
| 128 | | 0.6895 | 0.7739 |
| 256 | | 0.7512 | 0.8315 |
| 512 | | 0.8056 | 0.8767 |
## Setting 7
Qwen3-0.6B-Base, GSM8K train `400` step, GSM8K eval.
| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | m2nt7fyg | 512 | base eval on GSM8K |
| Single Avg | nqta9blp @ _step=400 | 2048 | training-run history; checkpoint no longer on local disk |
| Combined | nqta9blp @ _step=400 | 2048 | training-run history; checkpoint no longer on local disk |
| k | Base | Single Avg | Combined |
| --- | --- | --- | --- |
| 1 | 0.2707 | 0.7743 | |
| 2 | 0.4321 | 0.8348 | |
| 4 | 0.6106 | 0.8782 | 0.9012 |
| 8 | 0.7616 | 0.9098 | 0.9302 |
| 16 | 0.8629 | 0.9330 | 0.9509 |
| 32 | 0.9222 | 0.9503 | 0.9655 |
| 64 | 0.9553 | 0.9628 | 0.9754 |
| 128 | 0.9741 | 0.9716 | 0.9826 |
| 256 | 0.9843 | 0.9778 | 0.9881 |
| 512 | 0.9901 | 0.9830 | 0.9921 |
|