Latent Adversarial Regularization for Offline Preference Optimization
Paper
•
2601.22083
•
Published
•
13
None defined yet.
Latent Adversarial Regularization for Offline Preference Optimization
Endless Terminals: Scaling RL Environments for Terminal Agents