File size: 1,692 Bytes
f2dba50
26ea943
 
f2dba50
 
 
 
26ea943
 
 
f2dba50
 
26ea943
2b71308
26ea943
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
---
title: TestEvo-Bench
emoji: πŸ§ͺ
colorFrom: indigo
colorTo: green
sdk: static
pinned: false
short_description: Live benchmark for test generation and test update.
thumbnail: >-
  https://cdn-uploads.huggingface.co/production/uploads/69fa059362e7e8f47d7c5aa2/14h690a494OmjiPpt6elN.png
---

# TestEvo-Bench

<div align="center">

## A Live Benchmark for Test Generation & Test Update

Evaluating how AI agents understand and adapt tests to real-world software evolution.

🌐 https://www.testevo-bench.com/

</div>

---

TestEvo-Bench is a live benchmark for evaluating AI software engineering agents on realistic software test evolution tasks mined from open-source repositories.

Unlike traditional benchmarks that isolate tests from production changes, TestEvo-Bench models real software co-evolution between production code and test suites.

The benchmark contains two complementary tracks:

- 🟠 **Test Generation** β€” generate new tests for newly introduced behavior
- 🟣 **Test Update** β€” repair or adapt outdated tests after code changes

Each task is execution-grounded with runnable environments and evaluated using metrics such as pass rate, coverage, and mutation score.

## Datasets

- Test Generation β€” https://huggingface.co/datasets/TestEvo-Bench/teb-generation

- Test Update β€” https://huggingface.co/datasets/TestEvo-Bench/teb-update

## Links

- 🌐 Website β€” https://www.testevo-bench.com/

- πŸ€— Hugging Face Space β€” https://huggingface.co/spaces/TestEvo-Bench/

-  πŸ’» Code β€” https://anonymous.4open.science/r/testevo-bench-1150/README.md


---

<div align="center">

Real-world β€’ Execution-grounded β€’ Live software evolution benchmark

</div>