Soloof
/

SurpriseOpt

Model card Files Files and versions

SurpriseOpt / README.md

Soloof's picture

Update README.md

fba8382 verified about 2 months ago

|

History Blame Contribute Delete

1.73 kB

	---
	license: bsd-3-clause-clear
	language:
	- en
	tags:
	- optimization
	- optimizer
	- PyTorch
	- Julia
	paper_id: "zenodo:20060806"
	---

	# SurpriseOpt

	We introduce SurpriseOpt, an optimization framework that replaces constant exponential decay with state-dependent
	adaptive interpolation. The algorithm detects “surprises” as ratios of gradient magnitude and second-moment
	magnitude and modulates the effective inertia of first and second moments via adaptive gating functions.
	SurpriseOpt furthermore features a mechanism to escape plateaus in the loss function landscape: It accumulates
	information about recent low surprises as “boredom” and adapts the learning rate accordingly. This boredom feature can
	be added to any first-order optimizer. We demonstrate that SurpriseOpt can converge several times faster than Adam
	across various tasks.


	## Description

	This repository provides the Hugging Face entry for the research paper "SurpriseOpt: An Adaptive First-Order Optimizer Driven by Boredom".

	The scientific paper is published as preprint at [10.5281/zenodo.20060806](https://doi.org/10.5281/zenodo.20060806)

	The source code for both Julia and PyTorch is hosted on Codeberg: [https://codeberg.org/Soloof/SurpriseOpt](https://codeberg.org/Soloof/SurpriseOpt) </br>It contains a reference implementation of the algorithm to use it directly in Julia via [Flux](https://en.wikipedia.org/wiki/Flux_(machine-learning_framework)) or [Lux](https://lux.csail.mit.edu/stable/) or in Python via [PyTorch](https://en.wikipedia.org/wiki/PyTorch).

	## How to test the optimizer

	Please [follow the description](https://codeberg.org/Soloof/SurpriseOpt#how-to-do-use-the-reference-implementation-with-julia) at the project repository.