Open RL environment where the reward is temporal fact-currency. GRPO-trained Qwen2.5-3B LoRA lifts held-out supersession 9.0 -> 16.7 percent.
-
Supersede: Diagnosing and Training the Memory-Update Gap in LLM Agents
Paper • 2606.27472 • Published -
vedant33/supersede-qwen2.5-3b-grpo-lora
Text Generation • Updated • 4 -
vedant33/supersede-rl-episodes
Viewer • Updated • 2.2k • 8 -
Supersede Supersession Demo
đź§Supersession demo, GRPO-trained Qwen2.5-3B