Add metadata and links to paper/code
#2
by
nielsr HF Staff - opened
README.md
CHANGED
|
@@ -1,12 +1,16 @@
|
|
| 1 |
---
|
| 2 |
-
license: apache-2.0
|
| 3 |
datasets:
|
| 4 |
- timm/mini-imagenet
|
|
|
|
|
|
|
|
|
|
| 5 |
---
|
| 6 |
|
| 7 |
# Comparisons of timm Optimizers w/ Caution
|
| 8 |
|
| 9 |
-
This
|
|
|
|
|
|
|
| 10 |
|
| 11 |
The runs were all performed training a smaller ViT (`vit_wee_patch16_reg1_gap_256`) for 200 epochs (10M samples seen) from scratch on the `timm` 'mini-imagenet' dataset, a 100 class subset of imagenet with same image sizes as originals.
|
| 12 |
|
|
@@ -21,7 +25,7 @@ This is what the 'caution' addition looks like in an optimizer:
|
|
| 21 |
|
| 22 |
Train args:
|
| 23 |
|
| 24 |
-
```
|
| 25 |
./distributed_train.sh 2 --dataset hfds/timm/mini-imagenet --num-classes 100 --model vit_wee_patch16_reg1_gap_256 -j 8 --epochs 200 --warmup-prefix --sched-on-updates --warmup-lr 0 --mixup .2 --model-ema --model-ema-decay 0.999 --model-ema-warmup --aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --amp --weight-decay .05 --drop 0.1 --drop-path .1 -b 288 --opt cadamw --lr 1e-3
|
| 26 |
```
|
| 27 |
|
|
@@ -52,7 +56,7 @@ Train args:
|
|
| 52 |
|cadamw, lr=5e-04 |199.0|2.163278102874756 |1.0976034646987916|73.3900005859375 |91.31000137939454|
|
| 53 |
|cadamw, lr=1e-03, clip grads|203.0|2.1360626220703125|1.1043113907814026|73.33000158691407|91.41000042724608|
|
| 54 |
|adamw, lr=1e-03, clip grads |195.0|2.2746386528015137|1.142998440361023 |72.11000151367188|90.47000052490236|
|
| 55 |
-
|adamw, lr=5e-04 |185.0|2.3040246963500977|1.1535791856765747|71.50000120849609|90.4800001953125
|
| 56 |
|adamw, lr=1e-03 |199.0|2.223684310913086 |1.1657958560943604|71.22999993896484|90.30999958496092|
|
| 57 |
|cadamw, lr=2e-04 |189.0|2.538627862930298 |1.2325929063796996|68.94999995117188|89.61000139160156|
|
| 58 |
|adamw, lr=2e-04 |203.0|2.579624652862549 |1.3085522148132325|67.11000026855469|88.66000164794922|
|
|
@@ -69,9 +73,9 @@ Train args:
|
|
| 69 |
|---------------|----------|------------------|------------------|-----------------|-----------------|
|
| 70 |
|cmars, lr=1e-03|198.0 |2.054780960083008 |1.0435627010345458|74.91000185546875|92.08000146484376|
|
| 71 |
|cmars, lr=2e-03|203.0 |2.0272469520568848|1.0705795244216918|74.31000185546876|91.54000092773435|
|
| 72 |
-
|mars, lr=1e-03 |184.0 |2.219767808914185 |1.07215625667572 |74.06000178222656|91.6200013671875
|
| 73 |
-
|mars, lr=2e-03 |197.0 |2.1453990936279297|1.0963781481742858|73.73000098876953|91.1500006225586
|
| 74 |
-
|cmars, lr=5e-04|198.0 |2.2018630504608154|1.083557384109497 |73.32000045166015|91.67000092773438
|
| 75 |
|mars, lr=5e-04 |189.0 |2.322845220565796 |1.1199828132629397|72.02999995117187|90.86000173339843|
|
| 76 |
|
| 77 |
|
|
@@ -79,5 +83,4 @@ Train args:
|
|
| 79 |

|
| 80 |
|
| 81 |
## MARS Train Loss
|
| 82 |
-

|
| 83 |
-
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
datasets:
|
| 3 |
- timm/mini-imagenet
|
| 4 |
+
license: apache-2.0
|
| 5 |
+
pipeline_tag: image-classification
|
| 6 |
+
library_name: timm
|
| 7 |
---
|
| 8 |
|
| 9 |
# Comparisons of timm Optimizers w/ Caution
|
| 10 |
|
| 11 |
+
This repository contains summaries of several sets of experiments comparing a number of optimizers with and without **Caution**, as introduced in the paper [Cautious Optimizers: Improving Training with One Line of Code](https://huggingface.co/papers/2411.16085).
|
| 12 |
+
|
| 13 |
+
**Official Code**: [kyleliang919/C-Optim](https://github.com/kyleliang919/c-optim)
|
| 14 |
|
| 15 |
The runs were all performed training a smaller ViT (`vit_wee_patch16_reg1_gap_256`) for 200 epochs (10M samples seen) from scratch on the `timm` 'mini-imagenet' dataset, a 100 class subset of imagenet with same image sizes as originals.
|
| 16 |
|
|
|
|
| 25 |
|
| 26 |
Train args:
|
| 27 |
|
| 28 |
+
```bash
|
| 29 |
./distributed_train.sh 2 --dataset hfds/timm/mini-imagenet --num-classes 100 --model vit_wee_patch16_reg1_gap_256 -j 8 --epochs 200 --warmup-prefix --sched-on-updates --warmup-lr 0 --mixup .2 --model-ema --model-ema-decay 0.999 --model-ema-warmup --aa rand-m9-mstd0.5-inc1 --remode pixel --reprob 0.25 --amp --weight-decay .05 --drop 0.1 --drop-path .1 -b 288 --opt cadamw --lr 1e-3
|
| 30 |
```
|
| 31 |
|
|
|
|
| 56 |
|cadamw, lr=5e-04 |199.0|2.163278102874756 |1.0976034646987916|73.3900005859375 |91.31000137939454|
|
| 57 |
|cadamw, lr=1e-03, clip grads|203.0|2.1360626220703125|1.1043113907814026|73.33000158691407|91.41000042724608|
|
| 58 |
|adamw, lr=1e-03, clip grads |195.0|2.2746386528015137|1.142998440361023 |72.11000151367188|90.47000052490236|
|
| 59 |
+
|adamw, lr=5e-04 |185.0|2.3040246963500977|1.1535791856765747|71.50000120849609|90.4800001953125 |\
|
| 60 |
|adamw, lr=1e-03 |199.0|2.223684310913086 |1.1657958560943604|71.22999993896484|90.30999958496092|
|
| 61 |
|cadamw, lr=2e-04 |189.0|2.538627862930298 |1.2325929063796996|68.94999995117188|89.61000139160156|
|
| 62 |
|adamw, lr=2e-04 |203.0|2.579624652862549 |1.3085522148132325|67.11000026855469|88.66000164794922|
|
|
|
|
| 73 |
|---------------|----------|------------------|------------------|-----------------|-----------------|
|
| 74 |
|cmars, lr=1e-03|198.0 |2.054780960083008 |1.0435627010345458|74.91000185546875|92.08000146484376|
|
| 75 |
|cmars, lr=2e-03|203.0 |2.0272469520568848|1.0705795244216918|74.31000185546876|91.54000092773435|
|
| 76 |
+
|mars, lr=1e-03 |184.0 |2.219767808914185 |1.07215625667572 |74.06000178222656|91.6200013671875 |\
|
| 77 |
+
|mars, lr=2e-03 |197.0 |2.1453990936279297|1.0963781481742858|73.73000098876953|91.1500006225586 |\
|
| 78 |
+
|cmars, lr=5e-04|198.0 |2.2018630504608154|1.083557384109497 |73.32000045166015|91.67000092773438|\
|
| 79 |
|mars, lr=5e-04 |189.0 |2.322845220565796 |1.1199828132629397|72.02999995117187|90.86000173339843|
|
| 80 |
|
| 81 |
|
|
|
|
| 83 |

|
| 84 |
|
| 85 |
## MARS Train Loss
|
| 86 |
+

|
|
|