When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?
Paper • 2606.18531 • Published • 4
None defined yet.
When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?
Expert-Choice Routing Enables Adaptive Computation in Diffusion Language Models