A-3PO: Accelerating Asynchronous LLM Training with Staleness-aware Proximal Policy Approximation Paper • 2512.06547 • Published Dec 6, 2025