arxiv:2602.08054

Epigraph-Guided Flow Matching for Safe and Performant Offline Reinforcement Learning

Published on Feb 8

Authors:

Abstract

EpiFlow addresses safe offline reinforcement learning by formulating it as a state-constrained optimal control problem, using epigraph reformulation to co-optimize safety and performance while maintaining distribution consistency.

AI-generated summary

Offline reinforcement learning (RL) provides a compelling paradigm for training autonomous systems without the risks of online exploration, particularly in safety-critical domains. However, jointly achieving strong safety and performance from fixed datasets remains challenging. Existing safe offline RL methods often rely on soft constraints that allow violations, introduce excessive conservatism, or struggle to balance safety, reward optimization, and adherence to the data distribution. To address this, we propose Epigraph-Guided Flow Matching (EpiFlow), a framework that formulates safe offline RL as a state-constrained optimal control problem to co-optimize safety and performance. We learn a feasibility value function derived from an epigraph reformulation of the optimal control problem, thereby avoiding the decoupled objectives or post-hoc filtering common in prior work. Policies are synthesized by reweighting the behavior distribution based on this epigraph value function and fitting a generative policy via flow matching, enabling efficient, distribution-consistent sampling. Across various safety-critical tasks, including Safety-Gymnasium benchmarks, EpiFlow achieves competitive returns with near-zero empirical safety violations, demonstrating the effectiveness of epigraph-guided policy synthesis.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.08054 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.08054 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.08054 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.