TestEvo-Bench-Anonymous commited on
Commit
26ea943
Β·
verified Β·
1 Parent(s): 2b71308

Update README

Browse files
Files changed (1) hide show
  1. README.md +51 -5
README.md CHANGED
@@ -1,13 +1,59 @@
1
  ---
2
- title: README
3
- emoji: πŸ“š
4
  colorFrom: indigo
5
  colorTo: green
6
  sdk: static
7
  pinned: false
 
 
 
8
  ---
9
 
10
- Datasets for TestEvo-Bench
11
 
12
- - Test Update Track: https://huggingface.co/datasets/TestEvo-Bench/teb-update
13
- - Test Generation Track: https://huggingface.co/datasets/TestEvo-Bench/teb-generation
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: TestEvo-Bench
3
+ emoji: πŸ§ͺ
4
  colorFrom: indigo
5
  colorTo: green
6
  sdk: static
7
  pinned: false
8
+ short_description: Live benchmark for test generation and test update.
9
+ thumbnail: >-
10
+ https://cdn-uploads.huggingface.co/production/uploads/69fa059362e7e8f47d7c5aa2/14h690a494OmjiPpt6elN.png
11
  ---
12
 
13
+ # TestEvo-Bench
14
 
15
+ <div align="center">
16
+
17
+ ## A Live Benchmark for Test Generation & Test Update
18
+
19
+ Evaluating how AI agents understand and adapt tests to real-world software evolution.
20
+
21
+ 🌐 https://www.testevo-bench.com/
22
+
23
+ </div>
24
+
25
+ ---
26
+
27
+ TestEvo-Bench is a live benchmark for evaluating AI software engineering agents on realistic software test evolution tasks mined from open-source repositories.
28
+
29
+ Unlike traditional benchmarks that isolate tests from production changes, TestEvo-Bench models real software co-evolution between production code and test suites.
30
+
31
+ The benchmark contains two complementary tracks:
32
+
33
+ - 🟠 **Test Generation** β€” generate new tests for newly introduced behavior
34
+ - 🟣 **Test Update** β€” repair or adapt outdated tests after code changes
35
+
36
+ Each task is execution-grounded with runnable environments and evaluated using metrics such as pass rate, coverage, and mutation score.
37
+
38
+ ## Datasets
39
+
40
+ - Test Generation β€” https://huggingface.co/datasets/TestEvo-Bench/teb-generation
41
+
42
+ - Test Update β€” https://huggingface.co/datasets/TestEvo-Bench/teb-update
43
+
44
+ ## Links
45
+
46
+ - 🌐 Website β€” https://www.testevo-bench.com/
47
+
48
+ - πŸ€— Hugging Face Space β€” https://huggingface.co/spaces/TestEvo-Bench/
49
+
50
+ - πŸ’» Code β€” https://anonymous.4open.science/r/testevo-bench-1150/README.md
51
+
52
+
53
+ ---
54
+
55
+ <div align="center">
56
+
57
+ Real-world β€’ Execution-grounded β€’ Live software evolution benchmark
58
+
59
+ </div>