| --- |
| license: bsd-3-clause-clear |
| language: |
| - en |
| tags: |
| - optimization |
| - optimizer |
| - PyTorch |
| - Julia |
| paper_id: "zenodo:20060806" |
| --- |
| |
| # SurpriseOpt |
|
|
| We introduce SurpriseOpt, an optimization framework that replaces constant exponential decay with state-dependent |
| adaptive interpolation. The algorithm detects “surprises” as ratios of gradient magnitude and second-moment |
| magnitude and modulates the effective inertia of first and second moments via adaptive gating functions. |
| SurpriseOpt furthermore features a mechanism to escape plateaus in the loss function landscape: It accumulates |
| information about recent low surprises as “boredom” and adapts the learning rate accordingly. This boredom feature can |
| be added to any first-order optimizer. We demonstrate that SurpriseOpt can converge several times faster than Adam |
| across various tasks. |
|
|
|
|
| ## Description |
|
|
| This repository provides the Hugging Face entry for the research paper "SurpriseOpt: An Adaptive First-Order Optimizer Driven by Boredom". |
|
|
| The scientific paper is published as preprint at [10.5281/zenodo.20060806](https://doi.org/10.5281/zenodo.20060806) |
|
|
| The source code for both Julia and PyTorch is hosted on Codeberg: [https://codeberg.org/Soloof/SurpriseOpt](https://codeberg.org/Soloof/SurpriseOpt) </br>It contains a reference implementation of the algorithm to use it directly in Julia via [Flux](https://en.wikipedia.org/wiki/Flux_(machine-learning_framework)) or [Lux](https://lux.csail.mit.edu/stable/) or in Python via [PyTorch](https://en.wikipedia.org/wiki/PyTorch). |
|
|
| ## How to test the optimizer |
|
|
| Please [follow the description](https://codeberg.org/Soloof/SurpriseOpt#how-to-do-use-the-reference-implementation-with-julia) at the project repository. |
|
|