Spaces:

TestEvo-Bench
/

README

Running

App Files Files Community

TestEvo-Bench-Anonymous commited on 29 days ago

Commit

26ea943

verified ·

1 Parent(s): 2b71308

Update README

Browse files

Files changed (1) hide show

README.md +51 -5

README.md CHANGED Viewed

@@ -1,13 +1,59 @@
 ---
-title: README
-emoji: 📚
 colorFrom: indigo
 colorTo: green
 sdk: static
 pinned: false
 ---
-Datasets for TestEvo-Bench
-- Test Update Track: https://huggingface.co/datasets/TestEvo-Bench/teb-update
-- Test Generation Track: https://huggingface.co/datasets/TestEvo-Bench/teb-generation

 ---
+title: TestEvo-Bench
+emoji: 🧪
 colorFrom: indigo
 colorTo: green
 sdk: static
 pinned: false
+short_description: Live benchmark for test generation and test update.
+thumbnail: >-
+  https://cdn-uploads.huggingface.co/production/uploads/69fa059362e7e8f47d7c5aa2/14h690a494OmjiPpt6elN.png
 ---
+# TestEvo-Bench
+<div align="center">
+## A Live Benchmark for Test Generation & Test Update
+Evaluating how AI agents understand and adapt tests to real-world software evolution.
+🌐 https://www.testevo-bench.com/
+</div>
+---
+TestEvo-Bench is a live benchmark for evaluating AI software engineering agents on realistic software test evolution tasks mined from open-source repositories.
+Unlike traditional benchmarks that isolate tests from production changes, TestEvo-Bench models real software co-evolution between production code and test suites.
+The benchmark contains two complementary tracks:
+- 🟠 **Test Generation** — generate new tests for newly introduced behavior
+- 🟣 **Test Update** — repair or adapt outdated tests after code changes
+Each task is execution-grounded with runnable environments and evaluated using metrics such as pass rate, coverage, and mutation score.
+## Datasets
+- Test Generation — https://huggingface.co/datasets/TestEvo-Bench/teb-generation
+- Test Update — https://huggingface.co/datasets/TestEvo-Bench/teb-update
+## Links
+- 🌐 Website — https://www.testevo-bench.com/
+- 🤗 Hugging Face Space — https://huggingface.co/spaces/TestEvo-Bench/
+-  💻 Code — https://anonymous.4open.science/r/testevo-bench-1150/README.md
+---
+<div align="center">
+Real-world • Execution-grounded • Live software evolution benchmark
+</div>