--- title: TestEvo-Bench emoji: ๐Ÿงช colorFrom: indigo colorTo: green sdk: static pinned: false short_description: Live benchmark for test generation and test update. thumbnail: >- https://cdn-uploads.huggingface.co/production/uploads/69fa059362e7e8f47d7c5aa2/14h690a494OmjiPpt6elN.png --- # TestEvo-Bench
## A Live Benchmark for Test Generation & Test Update Evaluating how AI agents understand and adapt tests to real-world software evolution. ๐ŸŒ https://www.testevo-bench.com/
--- TestEvo-Bench is a live benchmark for evaluating AI software engineering agents on realistic software test evolution tasks mined from open-source repositories. Unlike traditional benchmarks that isolate tests from production changes, TestEvo-Bench models real software co-evolution between production code and test suites. The benchmark contains two complementary tracks: - ๐ŸŸ  **Test Generation** โ€” generate new tests for newly introduced behavior - ๐ŸŸฃ **Test Update** โ€” repair or adapt outdated tests after code changes Each task is execution-grounded with runnable environments and evaluated using metrics such as pass rate, coverage, and mutation score. ## Datasets - Test Generation โ€” https://huggingface.co/datasets/TestEvo-Bench/teb-generation - Test Update โ€” https://huggingface.co/datasets/TestEvo-Bench/teb-update ## Links - ๐ŸŒ Website โ€” https://www.testevo-bench.com/ - ๐Ÿค— Hugging Face Space โ€” https://huggingface.co/spaces/TestEvo-Bench/ - ๐Ÿ’ป Code โ€” https://anonymous.4open.science/r/testevo-bench-1150/README.md ---
Real-world โ€ข Execution-grounded โ€ข Live software evolution benchmark