File size: 5,943 Bytes
d825e06
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# Seven Setting Eval Tables

Last updated: 2026-04-13 UTC

Notes:
- `pass@1` is taken from `accuracy/mean`.
- `combined` is only defined from `pass@4` onward, so `pass@1` and `pass@2` are left blank.
- Blank cells mean the number is not available yet or I intentionally left it blank because the desired eval is still pending.
- For training runs, I pulled the metric from W&B history at the requested step (for example `_step=400` or `_step=1000`).

## Setting 1

Qwen2.5-0.5B-Instruct, GSM8K train `2000` step, GSM8K eval.

| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | m3ocmw3l | 512 | shared baseline |
| 1-LoRA | s4bxcc1l | 512 | resume global_step_2000 |
| 4-LoRA Single/Combined | rk9ic9kk | 2048 | resume global_step_2000 |
| MERL Single/Combined | (pending) | 2048 | eval run not finished yet; left blank |

| k | Base | 1-LoRA | 4-LoRA Single | 4-LoRA Combined | MERL Single | MERL Combined |
| --- | --- | --- | --- | --- | --- | --- |
| 1 | 0.4661 | 0.6264 | 0.6151 |  |  |  |
| 2 | 0.5943 | 0.6898 | 0.6960 |  |  |  |
| 4 | 0.7012 | 0.7442 | 0.7594 | 0.8048 |  |  |
| 8 | 0.7878 | 0.7915 | 0.8118 | 0.8568 |  |  |
| 16 | 0.8560 | 0.8318 | 0.8544 | 0.8963 |  |  |
| 32 | 0.9065 | 0.8663 | 0.8885 | 0.9252 |  |  |
| 64 | 0.9417 | 0.8953 | 0.9157 | 0.9463 |  |  |
| 128 | 0.9651 | 0.9176 | 0.9365 | 0.9626 |  |  |
| 256 | 0.9799 | 0.9350 | 0.9516 | 0.9751 |  |  |
| 512 | 0.9909 | 0.9487 | 0.9622 | 0.9838 |  |  |

## Setting 2

Qwen2.5-0.5B-Instruct, GSM8K train `200` step, GSM8K eval.

| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | m3ocmw3l | 512 | shared baseline |
| 1-LoRA | xw4w9c0u | 512 | resume global_step_200 |
| 4-LoRA Single/Combined | 2rytl841 | 2048 | resume global_step_200 |
| MERL Single/Combined | 0041qzrm | 2048 | resume global_step_200 |

| k | Base | 1-LoRA | 4-LoRA Single | 4-LoRA Combined | MERL Single | MERL Combined |
| --- | --- | --- | --- | --- | --- | --- |
| 1 | 0.4661 | 0.5942 | 0.5703 |  | 0.5335 |  |
| 2 | 0.5943 | 0.6842 | 0.6656 |  | 0.6450 |  |
| 4 | 0.7012 | 0.7557 | 0.7438 | 0.7772 | 0.7374 | 0.7584 |
| 8 | 0.7878 | 0.8125 | 0.8069 | 0.8389 | 0.8116 | 0.8308 |
| 16 | 0.8560 | 0.8590 | 0.8572 | 0.8871 | 0.8694 | 0.8861 |
| 32 | 0.9065 | 0.8978 | 0.8969 | 0.9237 | 0.9127 | 0.9266 |
| 64 | 0.9417 | 0.9285 | 0.9271 | 0.9503 | 0.9437 | 0.9544 |
| 128 | 0.9651 | 0.9497 | 0.9491 | 0.9682 | 0.9646 | 0.9723 |
| 256 | 0.9799 | 0.9636 | 0.9647 | 0.9795 | 0.9785 | 0.9841 |
| 512 | 0.9909 | 0.9727 | 0.9754 | 0.9870 | 0.9880 | 0.9920 |

## Setting 3

Qwen3-0.6B-Base, MATH train `400` step, Math eval.

| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | 1eidnqtd | 512 | base eval on Math500 |
| Single Avg | (pending) | 2048 | new eval launched in tmux 0:0; left blank for now |
| Combined | (pending) | 2048 | new eval launched in tmux 0:0; left blank for now |

| k | Base | Single Avg | Combined |
| --- | --- | --- | --- |
| 1 | 0.2154 |  |  |
| 2 | 0.3370 |  |  |
| 4 | 0.4754 |  |  |
| 8 | 0.6065 |  |  |
| 16 | 0.7143 |  |  |
| 32 | 0.7946 |  |  |
| 64 | 0.8513 |  |  |
| 128 | 0.8916 |  |  |
| 256 | 0.9207 |  |  |
| 512 | 0.9416 |  |  |

## Setting 4

Qwen2.5-0.5B-Instruct, MATH train `400` step, Math eval.

| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | ub2ua0fb | 512 | base eval on Math500 |
| Single Avg | bfgx3ra4 | 2048 | resume global_step_400 |
| Combined | bfgx3ra4 | 2048 | resume global_step_400 |

| k | Base | Single Avg | Combined |
| --- | --- | --- | --- |
| 1 | 0.3081 | 0.3568 |  |
| 2 | 0.4144 | 0.4484 |  |
| 4 | 0.5162 | 0.5351 | 0.5514 |
| 8 | 0.6078 | 0.6140 | 0.6305 |
| 16 | 0.6890 | 0.6847 | 0.7014 |
| 32 | 0.7598 | 0.7463 | 0.7634 |
| 64 | 0.8180 | 0.7977 | 0.8141 |
| 128 | 0.8627 | 0.8398 | 0.8549 |
| 256 | 0.8956 | 0.8750 | 0.8883 |
| 512 | 0.9195 | 0.9054 | 0.9147 |

## Setting 5

SmolLM2-360M-Instruct, GSM8K train `1000` step, GSM8K eval.

| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | (not found) |  | no standalone base eval run found |
| Single Avg | uw2s3olq @ _step=1000 | 2048 | training-run history |
| Combined | uw2s3olq @ _step=1000 | 2048 | training-run history |

| k | Base | Single Avg | Combined |
| --- | --- | --- | --- |
| 1 |  | 0.2237 |  |
| 2 |  | 0.2939 |  |
| 4 |  | 0.3664 | 0.4218 |
| 8 |  | 0.4397 | 0.5067 |
| 16 |  | 0.5130 | 0.5902 |
| 32 |  | 0.5850 | 0.6704 |
| 64 |  | 0.6530 | 0.7439 |
| 128 |  | 0.7147 | 0.8064 |
| 256 |  | 0.7692 | 0.8564 |
| 512 |  | 0.8166 | 0.8968 |

## Setting 6

SmolLM2-360M-Instruct, GSM8K train `200` step, GSM8K eval.

| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | (not found) |  | no standalone base eval run found |
| Single Avg | zv5xbryh | 2048 | resume global_step_200 |
| Combined | zv5xbryh | 2048 | resume global_step_200 |

| k | Base | Single Avg | Combined |
| --- | --- | --- | --- |
| 1 |  | 0.1588 |  |
| 2 |  | 0.2213 |  |
| 4 |  | 0.2925 | 0.3359 |
| 8 |  | 0.3718 | 0.4268 |
| 16 |  | 0.4564 | 0.5222 |
| 32 |  | 0.5410 | 0.6159 |
| 64 |  | 0.6196 | 0.7016 |
| 128 |  | 0.6895 | 0.7739 |
| 256 |  | 0.7512 | 0.8315 |
| 512 |  | 0.8056 | 0.8767 |

## Setting 7

Qwen3-0.6B-Base, GSM8K train `400` step, GSM8K eval.

| Variant | Source | N_VAL | Note |
| --- | --- | --- | --- |
| Base | m2nt7fyg | 512 | base eval on GSM8K |
| Single Avg | nqta9blp @ _step=400 | 2048 | training-run history; checkpoint no longer on local disk |
| Combined | nqta9blp @ _step=400 | 2048 | training-run history; checkpoint no longer on local disk |

| k | Base | Single Avg | Combined |
| --- | --- | --- | --- |
| 1 | 0.2707 | 0.7743 |  |
| 2 | 0.4321 | 0.8348 |  |
| 4 | 0.6106 | 0.8782 | 0.9012 |
| 8 | 0.7616 | 0.9098 | 0.9302 |
| 16 | 0.8629 | 0.9330 | 0.9509 |
| 32 | 0.9222 | 0.9503 | 0.9655 |
| 64 | 0.9553 | 0.9628 | 0.9754 |
| 128 | 0.9741 | 0.9716 | 0.9826 |
| 256 | 0.9843 | 0.9778 | 0.9881 |
| 512 | 0.9901 | 0.9830 | 0.9921 |