rwightman
/

timm-optim-caution

Model card Files Files and versions

xet

Community

Add metadata and links to paper/code

by nielsr HF Staff - opened Feb 17

base: refs/heads/main

←

from: refs/pr/2

Discussion Files changed

+12

-9

Files changed (1) hide show

README.md +12 -9

README.md CHANGED Viewed

@@ -1,12 +1,16 @@
 ---
-license: apache-2.0
 datasets:
 - timm/mini-imagenet
 ---
 # Comparisons of timm Optimizers w/ Caution
-This repo contains summaries of several sets of experiments comparing a number of optimizers with and without caution (https://huggingface.co/papers/2411.16085) enabled.
 The runs were all performed training a smaller ViT (`vit_wee_patch16_reg1_gap_256`) for 200 epochs (10M samples seen) from scratch on the `timm` 'mini-imagenet' dataset, a 100 class subset of imagenet with same image sizes as originals.
@@ -21,7 +25,7 @@ This is what the 'caution' addition looks like in an optimizer:
 Train args:
-```
 ./distributed_train.sh 2 --dataset hfds/timm/mini-imagenet --num-classes 100 --model vit_wee_patch16_reg1_gap_256 -j 8 --epochs 200 --warmup-prefix --sched-on-updates --warmup-lr 0 --mixup .2 --model-ema --model-ema-decay 0.999 --model-ema-warmup --aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --amp --weight-decay .05 --drop 0.1 --drop-path .1 -b 288 --opt cadamw --lr 1e-3
 ```
@@ -52,7 +56,7 @@ Train args:
 |cadamw, lr=5e-04            |199.0|2.163278102874756 |1.0976034646987916|73.3900005859375 |91.31000137939454|
 |cadamw, lr=1e-03, clip grads|203.0|2.1360626220703125|1.1043113907814026|73.33000158691407|91.41000042724608|
 |adamw, lr=1e-03, clip grads |195.0|2.2746386528015137|1.142998440361023 |72.11000151367188|90.47000052490236|
-|adamw, lr=5e-04             |185.0|2.3040246963500977|1.1535791856765747|71.50000120849609|90.4800001953125 |
 |adamw, lr=1e-03             |199.0|2.223684310913086 |1.1657958560943604|71.22999993896484|90.30999958496092|
 |cadamw, lr=2e-04            |189.0|2.538627862930298 |1.2325929063796996|68.94999995117188|89.61000139160156|
 |adamw, lr=2e-04             |203.0|2.579624652862549 |1.3085522148132325|67.11000026855469|88.66000164794922|
@@ -69,9 +73,9 @@ Train args:
 |---------------|----------|------------------|------------------|-----------------|-----------------|
 |cmars, lr=1e-03|198.0     |2.054780960083008 |1.0435627010345458|74.91000185546875|92.08000146484376|
 |cmars, lr=2e-03|203.0     |2.0272469520568848|1.0705795244216918|74.31000185546876|91.54000092773435|
-|mars, lr=1e-03 |184.0     |2.219767808914185 |1.07215625667572  |74.06000178222656|91.6200013671875 |
-|mars, lr=2e-03 |197.0     |2.1453990936279297|1.0963781481742858|73.73000098876953|91.1500006225586 |
-|cmars, lr=5e-04|198.0     |2.2018630504608154|1.083557384109497 |73.32000045166015|91.67000092773438|
 |mars, lr=5e-04 |189.0     |2.322845220565796 |1.1199828132629397|72.02999995117187|90.86000173339843|
@@ -79,5 +83,4 @@ Train args:
 ![Top-1](mars/eval_top1_comparison.png)
 ## MARS Train Loss
-![Loss](mars/train_loss_comparison.png)

 ---
 datasets:
 - timm/mini-imagenet
+license: apache-2.0
+pipeline_tag: image-classification
+library_name: timm
 ---
 # Comparisons of timm Optimizers w/ Caution
+This repository contains summaries of several sets of experiments comparing a number of optimizers with and without **Caution**, as introduced in the paper [Cautious Optimizers: Improving Training with One Line of Code](https://huggingface.co/papers/2411.16085).
+**Official Code**: [kyleliang919/C-Optim](https://github.com/kyleliang919/c-optim)
 The runs were all performed training a smaller ViT (`vit_wee_patch16_reg1_gap_256`) for 200 epochs (10M samples seen) from scratch on the `timm` 'mini-imagenet' dataset, a 100 class subset of imagenet with same image sizes as originals.
 Train args:
+```bash
 ./distributed_train.sh 2 --dataset hfds/timm/mini-imagenet --num-classes 100 --model vit_wee_patch16_reg1_gap_256 -j 8 --epochs 200 --warmup-prefix --sched-on-updates --warmup-lr 0 --mixup .2 --model-ema --model-ema-decay 0.999 --model-ema-warmup --aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --amp --weight-decay .05 --drop 0.1 --drop-path .1 -b 288 --opt cadamw --lr 1e-3
 ```
 |cadamw, lr=5e-04            |199.0|2.163278102874756 |1.0976034646987916|73.3900005859375 |91.31000137939454|
 |cadamw, lr=1e-03, clip grads|203.0|2.1360626220703125|1.1043113907814026|73.33000158691407|91.41000042724608|
 |adamw, lr=1e-03, clip grads |195.0|2.2746386528015137|1.142998440361023 |72.11000151367188|90.47000052490236|
+|adamw, lr=5e-04             |185.0|2.3040246963500977|1.1535791856765747|71.50000120849609|90.4800001953125 |\
 |adamw, lr=1e-03             |199.0|2.223684310913086 |1.1657958560943604|71.22999993896484|90.30999958496092|
 |cadamw, lr=2e-04            |189.0|2.538627862930298 |1.2325929063796996|68.94999995117188|89.61000139160156|
 |adamw, lr=2e-04             |203.0|2.579624652862549 |1.3085522148132325|67.11000026855469|88.66000164794922|
 |---------------|----------|------------------|------------------|-----------------|-----------------|
 |cmars, lr=1e-03|198.0     |2.054780960083008 |1.0435627010345458|74.91000185546875|92.08000146484376|
 |cmars, lr=2e-03|203.0     |2.0272469520568848|1.0705795244216918|74.31000185546876|91.54000092773435|
+|mars, lr=1e-03 |184.0     |2.219767808914185 |1.07215625667572  |74.06000178222656|91.6200013671875 |\
+|mars, lr=2e-03 |197.0     |2.1453990936279297|1.0963781481742858|73.73000098876953|91.1500006225586 |\
+|cmars, lr=5e-04|198.0     |2.2018630504608154|1.083557384109497 |73.32000045166015|91.67000092773438|\
 |mars, lr=5e-04 |189.0     |2.322845220565796 |1.1199828132629397|72.02999995117187|90.86000173339843|
 ![Top-1](mars/eval_top1_comparison.png)
 ## MARS Train Loss
+![Loss](mars/train_loss_comparison.png)