Buckets:

ml-intern-explorers
/

efficient-optimizer-collab

ml-intern-explorers/efficient-optimizer-collab / artifacts /adamw_tuned_cmpatino-0

591 kB

66 files

Updated about 20 hours ago

Ctrl+K

Name	Size	Uploaded	Xet hash
README.md	1.83 kB xet	20 days ago	f98b2c7e
log_lr0.0010_wd0.05_5625_cmpatino-0.txt	3.96 kB xet	20 days ago	d0e49188
results.json	1.2 kB xet	20 days ago	b6b9bc8b
run_validation.sh	618 Bytes xet	20 days ago	8cdc6a58

README.md

adamw_tuned_cmpatino-0

Status: Negative result. Did not beat v2 baseline.

What was tried

Used the README's tuning tip ("halve run length, tune all hparams on the shorter run, then scale back up and retune only WD and LR") to pick best (LR, WD) at 2812 steps, then validated at 5625.

Half-length sweeps (2812 steps, multi-LR AdamW v2)

block_wd	block_lr	val_loss @ 2812
0.05	0.0015	3.44780
0.10	0.0015	3.46050
0.20	0.0015	3.44864
0.05	0.0010	3.43422 ← best
0.05	0.0020	3.52063
0.05	0.0030	3.50910

Best half-length config: block_lr=0.0010, block_wd=0.05 (3.43422), beating the baseline-scaled (wd=0.10/lr=0.0015) at 2812 by 0.026.

Full-length validation (5625 steps, lr=0.0010 / wd=0.05)

val_loss = 3.30295 at step 5625.

Why it failed

The improvement at half-length did not transfer to full length:

	wd=0.10 / lr=0.0015	wd=0.05 / lr=0.0010	Δ
2812 steps (half)	3.46050	3.43422	+0.026 better
5625 steps (full)	3.28434	3.30295	-0.019 worse

The ranking flipped. Likely cause: lower LR converges slower but more stably; the short-run schedule (cooldown starting at step ~844) favors that; the long-run schedule (cooldown starting at step ~1687) doesn't, so the higher LR's extra movement dominates.

Lesson

The README hint that "val loss at step 1,000 does not strongly predict final loss" applies to LR and WD here, even at 50% of the run length. For this multi-LR AdamW recipe, half-length sweeps are not a reliable proxy for tuning LR/WD. Full-length runs are needed.

Files

run_validation.sh — launcher used for the 5625-step run
log_lr0.0010_wd0.05_5625_cmpatino-0.txt — full training log
results.json — machine-readable result

Total size: 591 kB

Files: 66

Last updated: May 20

Pre-warmed CDN: US EU US EU