TestEvo-Bench

AI & ML interests

None defined yet.

Recent Activity

TestEvo-Bench-Anonymous updated a dataset 1 day ago

TestEvo-Bench/teb-update

TestEvo-Bench-Anonymous updated a dataset 1 day ago

TestEvo-Bench/teb-generation

TestEvo-Bench-Anonymous updated a Space 1 day ago

TestEvo-Bench/README

View all activity

Organization Card

Community About org cards

TestEvo-Bench

A Live Benchmark for Test Generation & Test Update

Evaluating how AI agents understand and adapt tests to real-world software evolution.

🌐 https://www.testevo-bench.com/

TestEvo-Bench is a live benchmark for evaluating AI software engineering agents on realistic software test evolution tasks mined from open-source repositories.

Unlike traditional benchmarks that isolate tests from production changes, TestEvo-Bench models real software co-evolution between production code and test suites.

The benchmark contains two complementary tracks:

🟠 Test Generation — generate new tests for newly introduced behavior
🟣 Test Update — repair or adapt outdated tests after code changes

Each task is execution-grounded with runnable environments and evaluated using metrics such as pass rate, coverage, and mutation score.

Datasets

Test Generation — https://huggingface.co/datasets/TestEvo-Bench/teb-generation
Test Update — https://huggingface.co/datasets/TestEvo-Bench/teb-update

Links

🌐 Website — https://www.testevo-bench.com/
🤗 Hugging Face Space — https://huggingface.co/spaces/TestEvo-Bench/
💻 Code — https://anonymous.4open.science/r/testevo-bench-1150/README.md

Real-world • Execution-grounded • Live software evolution benchmark

models 0

None public yet

datasets 2

TestEvo-Bench/teb-update

Viewer • Updated 1 day ago • 509 • 22

TestEvo-Bench/teb-generation

Viewer • Updated 1 day ago • 746 • 19