Seven Setting Eval Tables

Last updated: 2026-04-13 UTC

Notes:

pass@1 is taken from accuracy/mean.
combined is only defined from pass@4 onward, so pass@1 and pass@2 are left blank.
Blank cells mean the number is not available yet or I intentionally left it blank because the desired eval is still pending.
For training runs, I pulled the metric from W&B history at the requested step (for example _step=400 or _step=1000).

Setting 1

Qwen2.5-0.5B-Instruct, GSM8K train 2000 step, GSM8K eval.

Variant	Source	N_VAL	Note
Base	m3ocmw3l	512	shared baseline
1-LoRA	s4bxcc1l	512	resume global_step_2000
4-LoRA Single/Combined	rk9ic9kk	2048	resume global_step_2000
MERL Single/Combined	(pending)	2048	eval run not finished yet; left blank

k	Base	1-LoRA	4-LoRA Single	4-LoRA Combined
1	0.4661	0.6264	0.6151
2	0.5943	0.6898	0.6960
4	0.7012	0.7442	0.7594	0.8048
8	0.7878	0.7915	0.8118	0.8568
16	0.8560	0.8318	0.8544	0.8963
32	0.9065	0.8663	0.8885	0.9252
64	0.9417	0.8953	0.9157	0.9463
128	0.9651	0.9176	0.9365	0.9626
256	0.9799	0.9350	0.9516	0.9751
512	0.9909	0.9487	0.9622	0.9838

Qwen2.5-0.5B-Instruct, GSM8K train 200 step, GSM8K eval.

Variant	Source	N_VAL	Note
Base	m3ocmw3l	512	shared baseline
1-LoRA	xw4w9c0u	512	resume global_step_200
4-LoRA Single/Combined	2rytl841	2048	resume global_step_200
MERL Single/Combined	0041qzrm	2048	resume global_step_200

k	Base	1-LoRA	4-LoRA Single	4-LoRA Combined	MERL Single	MERL Combined
1	0.4661	0.5942	0.5703		0.5335
2	0.5943	0.6842	0.6656		0.6450
4	0.7012	0.7557	0.7438	0.7772	0.7374	0.7584
8	0.7878	0.8125	0.8069	0.8389	0.8116	0.8308
16	0.8560	0.8590	0.8572	0.8871	0.8694	0.8861
32	0.9065	0.8978	0.8969	0.9237	0.9127	0.9266
64	0.9417	0.9285	0.9271	0.9503	0.9437	0.9544
128	0.9651	0.9497	0.9491	0.9682	0.9646	0.9723
256	0.9799	0.9636	0.9647	0.9795	0.9785	0.9841
512	0.9909	0.9727	0.9754	0.9870	0.9880	0.9920

Qwen3-0.6B-Base, MATH train 400 step, Math eval.

Variant	Source	N_VAL	Note
Base	1eidnqtd	512	base eval on Math500
Single Avg	(pending)	2048	new eval launched in tmux 0:0; left blank for now
Combined	(pending)	2048	new eval launched in tmux 0:0; left blank for now

k	Base	Single Avg	Combined
1	0.2154
2	0.3370
4	0.4754
8	0.6065
16	0.7143
32	0.7946
64	0.8513
128	0.8916
256	0.9207
512	0.9416

Qwen2.5-0.5B-Instruct, MATH train 400 step, Math eval.

Variant	Source	N_VAL	Note
Base	ub2ua0fb	512	base eval on Math500
Single Avg	bfgx3ra4	2048	resume global_step_400
Combined	bfgx3ra4	2048	resume global_step_400

k	Base	Single Avg	Combined
1	0.3081	0.3568
2	0.4144	0.4484
4	0.5162	0.5351	0.5514
8	0.6078	0.6140	0.6305
16	0.6890	0.6847	0.7014
32	0.7598	0.7463	0.7634
64	0.8180	0.7977	0.8141
128	0.8627	0.8398	0.8549
256	0.8956	0.8750	0.8883
512	0.9195	0.9054	0.9147

SmolLM2-360M-Instruct, GSM8K train 1000 step, GSM8K eval.

Variant	Source	N_VAL	Note
Base	(not found)		no standalone base eval run found
Single Avg	uw2s3olq @ _step=1000	2048	training-run history
Combined	uw2s3olq @ _step=1000	2048	training-run history

k	Single Avg	Combined
1	0.2237
2	0.2939
4	0.3664	0.4218
8	0.4397	0.5067
16	0.5130	0.5902
32	0.5850	0.6704
64	0.6530	0.7439
128	0.7147	0.8064
256	0.7692	0.8564
512	0.8166	0.8968

SmolLM2-360M-Instruct, GSM8K train 200 step, GSM8K eval.

Variant	Source	N_VAL	Note
Base	(not found)		no standalone base eval run found
Single Avg	zv5xbryh	2048	resume global_step_200
Combined	zv5xbryh	2048	resume global_step_200

k	Single Avg	Combined
1	0.1588
2	0.2213
4	0.2925	0.3359
8	0.3718	0.4268
16	0.4564	0.5222
32	0.5410	0.6159
64	0.6196	0.7016
128	0.6895	0.7739
256	0.7512	0.8315
512	0.8056	0.8767

Qwen3-0.6B-Base, GSM8K train 400 step, GSM8K eval.

Variant	Source	N_VAL	Note
Base	m2nt7fyg	512	base eval on GSM8K
Single Avg	nqta9blp @ _step=400	2048	training-run history; checkpoint no longer on local disk
Combined	nqta9blp @ _step=400	2048	training-run history; checkpoint no longer on local disk

k	Base	Single Avg	Combined
1	0.2707	0.7743
2	0.4321	0.8348
4	0.6106	0.8782	0.9012
8	0.7616	0.9098	0.9302
16	0.8629	0.9330	0.9509
32	0.9222	0.9503	0.9655
64	0.9553	0.9628	0.9754
128	0.9741	0.9716	0.9826
256	0.9843	0.9778	0.9881
512	0.9901	0.9830	0.9921