The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
Paper
• 2510.23393 • Published
• 21
Reliable and context-aware coding assistance
The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation
Mellum: Production-Grade in-IDE Contextual Code Completion with Multi-File Project Understanding