Spaces:
Running
Running
File size: 1,692 Bytes
f2dba50 26ea943 f2dba50 26ea943 f2dba50 26ea943 2b71308 26ea943 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | ---
title: TestEvo-Bench
emoji: π§ͺ
colorFrom: indigo
colorTo: green
sdk: static
pinned: false
short_description: Live benchmark for test generation and test update.
thumbnail: >-
https://cdn-uploads.huggingface.co/production/uploads/69fa059362e7e8f47d7c5aa2/14h690a494OmjiPpt6elN.png
---
# TestEvo-Bench
<div align="center">
## A Live Benchmark for Test Generation & Test Update
Evaluating how AI agents understand and adapt tests to real-world software evolution.
π https://www.testevo-bench.com/
</div>
---
TestEvo-Bench is a live benchmark for evaluating AI software engineering agents on realistic software test evolution tasks mined from open-source repositories.
Unlike traditional benchmarks that isolate tests from production changes, TestEvo-Bench models real software co-evolution between production code and test suites.
The benchmark contains two complementary tracks:
- π **Test Generation** β generate new tests for newly introduced behavior
- π£ **Test Update** β repair or adapt outdated tests after code changes
Each task is execution-grounded with runnable environments and evaluated using metrics such as pass rate, coverage, and mutation score.
## Datasets
- Test Generation β https://huggingface.co/datasets/TestEvo-Bench/teb-generation
- Test Update β https://huggingface.co/datasets/TestEvo-Bench/teb-update
## Links
- π Website β https://www.testevo-bench.com/
- π€ Hugging Face Space β https://huggingface.co/spaces/TestEvo-Bench/
- π» Code β https://anonymous.4open.science/r/testevo-bench-1150/README.md
---
<div align="center">
Real-world β’ Execution-grounded β’ Live software evolution benchmark
</div> |