MINER: Mining Multimodal Internal Representation for Efficient Retrieval
Paper • 2605.06460 • Published • 3
None defined yet.
A Gradient Perspective on RLVR Stability and Winner Advantage Policy Optimization
Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents