Spaces:
Running
Running
| title: TestEvo-Bench | |
| emoji: π§ͺ | |
| colorFrom: indigo | |
| colorTo: green | |
| sdk: static | |
| pinned: false | |
| short_description: Live benchmark for test generation and test update. | |
| thumbnail: >- | |
| https://cdn-uploads.huggingface.co/production/uploads/69fa059362e7e8f47d7c5aa2/14h690a494OmjiPpt6elN.png | |
| # TestEvo-Bench | |
| <div align="center"> | |
| ## A Live Benchmark for Test Generation & Test Update | |
| Evaluating how AI agents understand and adapt tests to real-world software evolution. | |
| π https://www.testevo-bench.com/ | |
| </div> | |
| --- | |
| TestEvo-Bench is a live benchmark for evaluating AI software engineering agents on realistic software test evolution tasks mined from open-source repositories. | |
| Unlike traditional benchmarks that isolate tests from production changes, TestEvo-Bench models real software co-evolution between production code and test suites. | |
| The benchmark contains two complementary tracks: | |
| - π **Test Generation** β generate new tests for newly introduced behavior | |
| - π£ **Test Update** β repair or adapt outdated tests after code changes | |
| Each task is execution-grounded with runnable environments and evaluated using metrics such as pass rate, coverage, and mutation score. | |
| ## Datasets | |
| - Test Generation β https://huggingface.co/datasets/TestEvo-Bench/teb-generation | |
| - Test Update β https://huggingface.co/datasets/TestEvo-Bench/teb-update | |
| ## Links | |
| - π Website β https://www.testevo-bench.com/ | |
| - π€ Hugging Face Space β https://huggingface.co/spaces/TestEvo-Bench/ | |
| - π» Code β https://anonymous.4open.science/r/testevo-bench-1150/README.md | |
| --- | |
| <div align="center"> | |
| Real-world β’ Execution-grounded β’ Live software evolution benchmark | |
| </div> |