Joint Selection for Large-Scale Pre-Training Data via Policy Gradient-based Mask Learning Paper • 2512.24265 • Published Dec 30, 2025 • 4
Cache-to-Cache: Direct Semantic Communication Between Large Language Models Paper • 2510.03215 • Published Oct 3, 2025 • 98
EPO: Entropy-regularized Policy Optimization for LLM Agents Reinforcement Learning Paper • 2509.22576 • Published Sep 26, 2025 • 135
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11, 2025 • 47