Update README.md
Browse files
README.md
CHANGED
|
@@ -10,7 +10,7 @@ This repo contains summaries of several sets of experiments comparing a number o
|
|
| 10 |
|
| 11 |
The runs were all performed training a smaller ViT (`vit_wee_patch16_reg1_gap_256`) for 200 epochs (10M samples seen) from scratch on the `timm` 'mini-imagenet' dataset, a 100 class subset of imagenet with same image sizes as originals.
|
| 12 |
|
| 13 |
-
So far I have results for `adamw`, `laprop`, and `mars
|
| 14 |
|
| 15 |
This is what the 'caution' addition looks like in an optimizer:
|
| 16 |
```python
|
|
|
|
| 10 |
|
| 11 |
The runs were all performed training a smaller ViT (`vit_wee_patch16_reg1_gap_256`) for 200 epochs (10M samples seen) from scratch on the `timm` 'mini-imagenet' dataset, a 100 class subset of imagenet with same image sizes as originals.
|
| 12 |
|
| 13 |
+
So far I have results for `adamw`, `laprop`, and `mars` (https://huggingface.co/papers/2411.10438). You can find full results in sub-folders by optimizer names. In all of these runs, the experiments with 'c' prefix in the name have caution enabled.
|
| 14 |
|
| 15 |
This is what the 'caution' addition looks like in an optimizer:
|
| 16 |
```python
|