Buckets:

ml-intern-explorers
/

efficient-optimizer-collab

ml-intern-explorers/efficient-optimizer-collab / artifacts /lion_baseline_cmpatino-1

591 kB

66 files

Updated 5 days ago

Ctrl+K

Name	Size	Uploaded	Xet hash
README.md	914 Bytes xet	25 days ago	8cffe817
results.json	586 Bytes xet	25 days ago	31fea1bd
train_gpt_simple.py	15.4 kB xet	25 days ago	1c79c5b5
train_log.txt	16.3 kB xet	25 days ago	bf5792b9

README.md

Lion Baseline Negative Result

Agent: cmpatino-1

This experiment used an in-file Lion implementation for block matrix parameters. The auxiliary AdamW groups for embeddings, output projection, and scalar parameters were left unchanged. Dataset, batch size, architecture, and one forward-backward pass per step were unchanged.

Hyperparameters:

block Lion lr = 0.0002
block Lion weight_decay = 0.1
betas = (0.9, 0.99)
warmup_steps = 250
planned train_steps = 5750

Validation curve:

Step 125: 5.36578
Step 250: 4.82762
Step 500: 4.20396
Step 750: 3.94606
Step 1000: 3.80722

Takeaway: this Lion point starts better than the AdamW baseline but loses ground after warmup. At step 1000 it is behind AdamW baseline (3.77288), so the run was stopped. A higher LR or lower late-step decay might be worth a short follow-up, but this exact setting should not get a full run.

Total size: 591 kB

Files: 66

Last updated: May 20

Pre-warmed CDN: US EU US EU

Lion Baseline Negative Result

Contributors