Add metadata and links to paper/code

#2
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +12 -9
README.md CHANGED
@@ -1,12 +1,16 @@
1
  ---
2
- license: apache-2.0
3
  datasets:
4
  - timm/mini-imagenet
 
 
 
5
  ---
6
 
7
  # Comparisons of timm Optimizers w/ Caution
8
 
9
- This repo contains summaries of several sets of experiments comparing a number of optimizers with and without caution (https://huggingface.co/papers/2411.16085) enabled.
 
 
10
 
11
  The runs were all performed training a smaller ViT (`vit_wee_patch16_reg1_gap_256`) for 200 epochs (10M samples seen) from scratch on the `timm` 'mini-imagenet' dataset, a 100 class subset of imagenet with same image sizes as originals.
12
 
@@ -21,7 +25,7 @@ This is what the 'caution' addition looks like in an optimizer:
21
 
22
  Train args:
23
 
24
- ```
25
  ./distributed_train.sh 2 --dataset hfds/timm/mini-imagenet --num-classes 100 --model vit_wee_patch16_reg1_gap_256 -j 8 --epochs 200 --warmup-prefix --sched-on-updates --warmup-lr 0 --mixup .2 --model-ema --model-ema-decay 0.999 --model-ema-warmup --aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --amp --weight-decay .05 --drop 0.1 --drop-path .1 -b 288 --opt cadamw --lr 1e-3
26
  ```
27
 
@@ -52,7 +56,7 @@ Train args:
52
  |cadamw, lr=5e-04 |199.0|2.163278102874756 |1.0976034646987916|73.3900005859375 |91.31000137939454|
53
  |cadamw, lr=1e-03, clip grads|203.0|2.1360626220703125|1.1043113907814026|73.33000158691407|91.41000042724608|
54
  |adamw, lr=1e-03, clip grads |195.0|2.2746386528015137|1.142998440361023 |72.11000151367188|90.47000052490236|
55
- |adamw, lr=5e-04 |185.0|2.3040246963500977|1.1535791856765747|71.50000120849609|90.4800001953125 |
56
  |adamw, lr=1e-03 |199.0|2.223684310913086 |1.1657958560943604|71.22999993896484|90.30999958496092|
57
  |cadamw, lr=2e-04 |189.0|2.538627862930298 |1.2325929063796996|68.94999995117188|89.61000139160156|
58
  |adamw, lr=2e-04 |203.0|2.579624652862549 |1.3085522148132325|67.11000026855469|88.66000164794922|
@@ -69,9 +73,9 @@ Train args:
69
  |---------------|----------|------------------|------------------|-----------------|-----------------|
70
  |cmars, lr=1e-03|198.0 |2.054780960083008 |1.0435627010345458|74.91000185546875|92.08000146484376|
71
  |cmars, lr=2e-03|203.0 |2.0272469520568848|1.0705795244216918|74.31000185546876|91.54000092773435|
72
- |mars, lr=1e-03 |184.0 |2.219767808914185 |1.07215625667572 |74.06000178222656|91.6200013671875 |
73
- |mars, lr=2e-03 |197.0 |2.1453990936279297|1.0963781481742858|73.73000098876953|91.1500006225586 |
74
- |cmars, lr=5e-04|198.0 |2.2018630504608154|1.083557384109497 |73.32000045166015|91.67000092773438|
75
  |mars, lr=5e-04 |189.0 |2.322845220565796 |1.1199828132629397|72.02999995117187|90.86000173339843|
76
 
77
 
@@ -79,5 +83,4 @@ Train args:
79
  ![Top-1](mars/eval_top1_comparison.png)
80
 
81
  ## MARS Train Loss
82
- ![Loss](mars/train_loss_comparison.png)
83
-
 
1
  ---
 
2
  datasets:
3
  - timm/mini-imagenet
4
+ license: apache-2.0
5
+ pipeline_tag: image-classification
6
+ library_name: timm
7
  ---
8
 
9
  # Comparisons of timm Optimizers w/ Caution
10
 
11
+ This repository contains summaries of several sets of experiments comparing a number of optimizers with and without **Caution**, as introduced in the paper [Cautious Optimizers: Improving Training with One Line of Code](https://huggingface.co/papers/2411.16085).
12
+
13
+ **Official Code**: [kyleliang919/C-Optim](https://github.com/kyleliang919/c-optim)
14
 
15
  The runs were all performed training a smaller ViT (`vit_wee_patch16_reg1_gap_256`) for 200 epochs (10M samples seen) from scratch on the `timm` 'mini-imagenet' dataset, a 100 class subset of imagenet with same image sizes as originals.
16
 
 
25
 
26
  Train args:
27
 
28
+ ```bash
29
  ./distributed_train.sh 2 --dataset hfds/timm/mini-imagenet --num-classes 100 --model vit_wee_patch16_reg1_gap_256 -j 8 --epochs 200 --warmup-prefix --sched-on-updates --warmup-lr 0 --mixup .2 --model-ema --model-ema-decay 0.999 --model-ema-warmup --aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --amp --weight-decay .05 --drop 0.1 --drop-path .1 -b 288 --opt cadamw --lr 1e-3
30
  ```
31
 
 
56
  |cadamw, lr=5e-04 |199.0|2.163278102874756 |1.0976034646987916|73.3900005859375 |91.31000137939454|
57
  |cadamw, lr=1e-03, clip grads|203.0|2.1360626220703125|1.1043113907814026|73.33000158691407|91.41000042724608|
58
  |adamw, lr=1e-03, clip grads |195.0|2.2746386528015137|1.142998440361023 |72.11000151367188|90.47000052490236|
59
+ |adamw, lr=5e-04 |185.0|2.3040246963500977|1.1535791856765747|71.50000120849609|90.4800001953125 |\
60
  |adamw, lr=1e-03 |199.0|2.223684310913086 |1.1657958560943604|71.22999993896484|90.30999958496092|
61
  |cadamw, lr=2e-04 |189.0|2.538627862930298 |1.2325929063796996|68.94999995117188|89.61000139160156|
62
  |adamw, lr=2e-04 |203.0|2.579624652862549 |1.3085522148132325|67.11000026855469|88.66000164794922|
 
73
  |---------------|----------|------------------|------------------|-----------------|-----------------|
74
  |cmars, lr=1e-03|198.0 |2.054780960083008 |1.0435627010345458|74.91000185546875|92.08000146484376|
75
  |cmars, lr=2e-03|203.0 |2.0272469520568848|1.0705795244216918|74.31000185546876|91.54000092773435|
76
+ |mars, lr=1e-03 |184.0 |2.219767808914185 |1.07215625667572 |74.06000178222656|91.6200013671875 |\
77
+ |mars, lr=2e-03 |197.0 |2.1453990936279297|1.0963781481742858|73.73000098876953|91.1500006225586 |\
78
+ |cmars, lr=5e-04|198.0 |2.2018630504608154|1.083557384109497 |73.32000045166015|91.67000092773438|\
79
  |mars, lr=5e-04 |189.0 |2.322845220565796 |1.1199828132629397|72.02999995117187|90.86000173339843|
80
 
81
 
 
83
  ![Top-1](mars/eval_top1_comparison.png)
84
 
85
  ## MARS Train Loss
86
+ ![Loss](mars/train_loss_comparison.png)