Update README.md
Browse files
README.md
CHANGED
|
@@ -12,6 +12,14 @@ The runs were all performed training a smaller ViT (`vit_wee_patch16_reg1_gap_25
|
|
| 12 |
|
| 13 |
So far I have results for `adamw`, `laprop`, and `mars`. You can find full results in sub-folders by optimizer names. In all of these runs, the experiments with 'c' prefix in the name have caution enabled.
|
| 14 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 15 |
# LaProp
|
| 16 |
|
| 17 |
|optim |best_epoch|train_loss |eval_loss |eval_top1 |eval_top5 |lr |
|
|
|
|
| 12 |
|
| 13 |
So far I have results for `adamw`, `laprop`, and `mars`. You can find full results in sub-folders by optimizer names. In all of these runs, the experiments with 'c' prefix in the name have caution enabled.
|
| 14 |
|
| 15 |
+
This is what the 'caution' addition looks like in an optimizer:
|
| 16 |
+
```python
|
| 17 |
+
mask = (exp_avg * grad > 0).to(grad.dtype)
|
| 18 |
+
mask.div_(mask.mean().clamp_(min=1e-3))
|
| 19 |
+
exp_avg = exp_avg * mask
|
| 20 |
+
```
|
| 21 |
+
|
| 22 |
+
|
| 23 |
# LaProp
|
| 24 |
|
| 25 |
|optim |best_epoch|train_loss |eval_loss |eval_top1 |eval_top5 |lr |
|