SoTaNa: The Open-Source Software Development Assistant Paper • 2308.13416 • Published Aug 25, 2023 • 13
Skywork-SWE: Unveiling Data Scaling Laws for Software Engineering in LLMs Paper • 2506.19290 • Published Jun 24, 2025 • 53
Impact of Large Language Models on Generating Software Specifications Paper • 2306.03324 • Published Jun 6, 2023 • 3
ProX Dataset Collection a collection of pre-training corpora refined by ProX • 6 items • Updated Feb 14, 2025 • 9
Idea First, Code Later: Disentangling Problem Solving from Code Generation in Evaluating LLMs for Competitive Programming Paper • 2601.11332 • Published 10 days ago • 1
How Programming Concepts and Neurons Are Shared in Code Language Models Paper • 2506.01074 • Published Jun 1, 2025 • 4
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published Sep 25, 2024 • 64
Can Programming Languages Boost Each Other via Instruction Tuning? Paper • 2308.16824 • Published Aug 31, 2023 • 12
Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks Paper • 2510.23208 • Published Oct 27, 2025 • 1