Approximation of Log-Partition Function in Policy Mirror Descent Induces Implicit Regularization for LLM Post-Training Paper • 2602.05933 • Published 3 days ago • 5