--- license: bsd-3-clause-clear language: - en tags: - optimization - optimizer - PyTorch - Julia paper_id: "zenodo:20060806" --- # SurpriseOpt We introduce SurpriseOpt, an optimization framework that replaces constant exponential decay with state-dependent adaptive interpolation. The algorithm detects “surprises” as ratios of gradient magnitude and second-moment magnitude and modulates the effective inertia of first and second moments via adaptive gating functions. SurpriseOpt furthermore features a mechanism to escape plateaus in the loss function landscape: It accumulates information about recent low surprises as “boredom” and adapts the learning rate accordingly. This boredom feature can be added to any first-order optimizer. We demonstrate that SurpriseOpt can converge several times faster than Adam across various tasks. ## Description This repository provides the Hugging Face entry for the research paper "SurpriseOpt: An Adaptive First-Order Optimizer Driven by Boredom". The scientific paper is published as preprint at [10.5281/zenodo.20060806](https://doi.org/10.5281/zenodo.20060806) The source code for both Julia and PyTorch is hosted on Codeberg: [https://codeberg.org/Soloof/SurpriseOpt](https://codeberg.org/Soloof/SurpriseOpt)
It contains a reference implementation of the algorithm to use it directly in Julia via [Flux](https://en.wikipedia.org/wiki/Flux_(machine-learning_framework)) or [Lux](https://lux.csail.mit.edu/stable/) or in Python via [PyTorch](https://en.wikipedia.org/wiki/PyTorch). ## How to test the optimizer Please [follow the description](https://codeberg.org/Soloof/SurpriseOpt#how-to-do-use-the-reference-implementation-with-julia) at the project repository.