arxiv:2512.14202

Understanding and Improving Hyperbolic Deep Reinforcement Learning

Published on Dec 16

· Submitted by

Timo K. on Dec 18

University of Vienna

Upvote

Authors:

Timo Klein ,

Thomas Lang ,

Abstract

Hyper++ is a hyperbolic deep RL agent that improves stability and performance by addressing gradient issues and norm constraints in hyperbolic feature spaces.

AI-generated summary

The performance of reinforcement learning (RL) agents depends critically on the quality of the underlying feature representations. Hyperbolic feature spaces are well-suited for this purpose, as they naturally capture hierarchical and relational structure often present in complex RL environments. However, leveraging these spaces commonly faces optimization challenges due to the nonstationarity of RL. In this work, we identify key factors that determine the success and failure of training hyperbolic deep RL agents. By analyzing the gradients of core operations in the Poincaré Ball and Hyperboloid models of hyperbolic geometry, we show that large-norm embeddings destabilize gradient-based training, leading to trust-region violations in proximal policy optimization (PPO). Based on these insights, we introduce Hyper++, a new hyperbolic PPO agent that consists of three components: (i) stable critic training through a categorical value loss instead of regression; (ii) feature regularization guaranteeing bounded norms while avoiding the curse of dimensionality from clipping; and (iii) using a more optimization-friendly formulation of hyperbolic network layers. In experiments on ProcGen, we show that Hyper++ guarantees stable learning, outperforms prior hyperbolic agents, and reduces wall-clock time by approximately 30%. On Atari-5 with Double DQN, Hyper++ strongly outperforms Euclidean and hyperbolic baselines. We release our code at https://github.com/Probabilistic-and-Interactive-ML/hyper-rl .

View arXiv page View PDF Add to collection

Community

X3N4

Paper author Paper submitter 7 days ago

tl;dr:

We analytically show that large-norm embeddings destabilize hyperbolic representations in deep RL.
In PPO, this coincides with trust-region violations.
Existing methods based on SpectralNorm mitigate these issues only partially.
We propose a theoretically principled combination of stabilization techniques, Hyper++.
Hyper++ substantially outperforms existing hyperbolic agents on ProcGen (PPO) and Atari (DDQN).
Because we do not have the power iteration overhead from SpectralNorm, Hyper++ is also faster.

Happy to answer any questions :)

avahal

7 days ago

arXiv lens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/understanding-and-improving-hyperbolic-deep-reinforcement-learning-6053-d8a239e3

Executive Summary
Detailed Breakdown
Practical Applications

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2512.14202 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2512.14202 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2512.14202 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.