Feedback_Conditional_Policy
Collection
Collections for the paper "Language Models Can Learn from Verbal Feedback Without Scalar Rewards" (https://arxiv.org/pdf/2509.22638)
•
7 items
•
Updated
•
1