Substance Beats Style: Why Beginning Students Fail to Code with LLMs Paper • 2410.19792 • Published Oct 15, 2024
SelfCodeAlign: Self-Alignment for Code Generation Paper • 2410.24198 • Published Oct 31, 2024 • 24
AgentPack: A Dataset of Code Changes, Co-Authored by Agents and Humans Paper • 2509.21891 • Published Sep 26, 2025
StudentEval: A Benchmark of Student-Written Prompts for Large Language Models of Code Paper • 2306.04556 • Published Jun 7, 2023
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation Paper • 2208.08227 • Published Aug 17, 2022 • 1
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published Feb 3, 2025 • 9
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published Feb 3, 2025 • 9
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published Feb 3, 2025 • 9
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published Feb 3, 2025 • 9
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published Feb 3, 2025 • 9
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published Feb 3, 2025 • 9
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 35
NNsight and NDIF: Democratizing Access to Foundation Model Internals Paper • 2407.14561 • Published Jul 18, 2024 • 35
$τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains Paper • 2406.12045 • Published Jun 17, 2024 • 9
Activation Steering for Robust Type Prediction in CodeLLMs Paper • 2404.01903 • Published Apr 2, 2024 • 2
Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs Paper • 2308.09895 • Published Aug 19, 2023 • 1
Activation Steering for Robust Type Prediction in CodeLLMs Paper • 2404.01903 • Published Apr 2, 2024 • 2
StarCoder 2 and The Stack v2: The Next Generation Paper • 2402.19173 • Published Feb 29, 2024 • 152
Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions Paper • 2312.12450 • Published Dec 11, 2023 • 1