HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation Paper • 2412.21199 • Published Dec 30, 2024 • 13
AlphaResearch: Accelerating New Algorithm Discovery with Language Models Paper • 2511.08522 • Published Nov 11, 2025 • 18
AlphaResearch: Accelerating New Algorithm Discovery with Language Models Paper • 2511.08522 • Published Nov 11, 2025 • 18
AlphaResearch: Accelerating New Algorithm Discovery with Language Models Paper • 2511.08522 • Published Nov 11, 2025 • 18 • 2
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1, 2025 • 46