- GitBug-Java: A Reproducible Benchmark of Recent Java Bugs Bug-fix benchmarks are essential for evaluating methodologies in automatic program repair (APR) and fault localization (FL). However, existing benchmarks, exemplified by Defects4J, need to evolve to incorporate recent bug-fixes aligned with contemporary development practices. Moreover, reproducibility, a key scientific principle, has been lacking in bug-fix benchmarks. To address these gaps, we present GitBug-Java, a reproducible benchmark of recent Java bugs. GitBug-Java features 199 bugs extracted from the 2023 commit history of 55 notable open-source repositories. The methodology for building GitBug-Java ensures the preservation of bug-fixes in fully-reproducible environments. We publish GitBug-Java at https://github.com/gitbugactions/gitbug-java. 3 authors · Feb 5, 2024
- RepairBench: Leaderboard of Frontier Models for Program Repair AI-driven program repair uses AI models to repair buggy software by producing patches. Rapid advancements in AI surely impact state-of-the-art performance of program repair. Yet, grasping this progress requires frequent and standardized evaluations. We propose RepairBench, a novel leaderboard for AI-driven program repair. The key characteristics of RepairBench are: 1) it is execution-based: all patches are compiled and executed against a test suite, 2) it assesses frontier models in a frequent and standardized way. RepairBench leverages two high-quality benchmarks, Defects4J and GitBug-Java, to evaluate frontier models against real-world program repair tasks. We publicly release the evaluation framework of RepairBench. We will update the leaderboard as new frontier models are released. 2 authors · Sep 27, 2024
- GitBug-Actions: Building Reproducible Bug-Fix Benchmarks with GitHub Actions Bug-fix benchmarks are fundamental in advancing various sub-fields of software engineering such as automatic program repair (APR) and fault localization (FL). A good benchmark must include recent examples that accurately reflect technologies and development practices of today. To be executable in the long term, a benchmark must feature test suites that do not degrade overtime due to, for example, dependencies that are no longer available. Existing benchmarks fail in meeting both criteria. For instance, Defects4J, one of the foremost Java benchmarks, last received an update in 2020. Moreover, full-reproducibility has been neglected by the majority of existing benchmarks. In this paper, we present GitBug-Actions: a novel tool for building bug-fix benchmarks with modern and fully-reproducible bug-fixes. GitBug-Actions relies on the most popular CI platform, GitHub Actions, to detect bug-fixes and smartly locally execute the CI pipeline in a controlled and reproducible environment. To the best of our knowledge, we are the first to rely on GitHub Actions to collect bug-fixes. To demonstrate our toolchain, we deploy GitBug-Actions to build a proof-of-concept Go bug-fix benchmark containing executable, fully-reproducible bug-fixes from different repositories. A video demonstrating GitBug-Actions is available at: https://youtu.be/aBWwa1sJYBs. 3 authors · Oct 24, 2023