SurpriseOpt / README.md
Soloof's picture
Update README.md
fba8382 verified
|
Raw
History Blame Contribute Delete
1.73 kB
---
license: bsd-3-clause-clear
language:
- en
tags:
- optimization
- optimizer
- PyTorch
- Julia
paper_id: "zenodo:20060806"
---
# SurpriseOpt
We introduce SurpriseOpt, an optimization framework that replaces constant exponential decay with state-dependent
adaptive interpolation. The algorithm detects “surprises” as ratios of gradient magnitude and second-moment
magnitude and modulates the effective inertia of first and second moments via adaptive gating functions.
SurpriseOpt furthermore features a mechanism to escape plateaus in the loss function landscape: It accumulates
information about recent low surprises as “boredom” and adapts the learning rate accordingly. This boredom feature can
be added to any first-order optimizer. We demonstrate that SurpriseOpt can converge several times faster than Adam
across various tasks.
## Description
This repository provides the Hugging Face entry for the research paper "SurpriseOpt: An Adaptive First-Order Optimizer Driven by Boredom".
The scientific paper is published as preprint at [10.5281/zenodo.20060806](https://doi.org/10.5281/zenodo.20060806)
The source code for both Julia and PyTorch is hosted on Codeberg: [https://codeberg.org/Soloof/SurpriseOpt](https://codeberg.org/Soloof/SurpriseOpt) </br>It contains a reference implementation of the algorithm to use it directly in Julia via [Flux](https://en.wikipedia.org/wiki/Flux_(machine-learning_framework)) or [Lux](https://lux.csail.mit.edu/stable/) or in Python via [PyTorch](https://en.wikipedia.org/wiki/PyTorch).
## How to test the optimizer
Please [follow the description](https://codeberg.org/Soloof/SurpriseOpt#how-to-do-use-the-reference-implementation-with-julia) at the project repository.