AI & ML interests

None defined yet.

Recent Activity

Organization Card

TestEvo-Bench

A Live Benchmark for Test Generation & Test Update

Evaluating how AI agents understand and adapt tests to real-world software evolution.

๐ŸŒ https://www.testevo-bench.com/


TestEvo-Bench is a live benchmark for evaluating AI software engineering agents on realistic software test evolution tasks mined from open-source repositories.

Unlike traditional benchmarks that isolate tests from production changes, TestEvo-Bench models real software co-evolution between production code and test suites.

The benchmark contains two complementary tracks:

  • ๐ŸŸ  Test Generation โ€” generate new tests for newly introduced behavior
  • ๐ŸŸฃ Test Update โ€” repair or adapt outdated tests after code changes

Each task is execution-grounded with runnable environments and evaluated using metrics such as pass rate, coverage, and mutation score.

Datasets

Links


Real-world โ€ข Execution-grounded โ€ข Live software evolution benchmark

models 0

None public yet