Spaces:

TestEvo-Bench
/

README

Running

App Files Files Community

README / README.md

TestEvo-Bench-Anonymous

Update README

26ea943 verified 8 days ago

preview code

raw

history blame contribute delete

1.69 kB

	---
	title: TestEvo-Bench
	emoji: 🧪
	colorFrom: indigo
	colorTo: green
	sdk: static
	pinned: false
	short_description: Live benchmark for test generation and test update.
	thumbnail: >-
	https://cdn-uploads.huggingface.co/production/uploads/69fa059362e7e8f47d7c5aa2/14h690a494OmjiPpt6elN.png
	---

	# TestEvo-Bench

	<div align="center">

	## A Live Benchmark for Test Generation & Test Update

	Evaluating how AI agents understand and adapt tests to real-world software evolution.

	🌐 https://www.testevo-bench.com/

	</div>

	---

	TestEvo-Bench is a live benchmark for evaluating AI software engineering agents on realistic software test evolution tasks mined from open-source repositories.

	Unlike traditional benchmarks that isolate tests from production changes, TestEvo-Bench models real software co-evolution between production code and test suites.

	The benchmark contains two complementary tracks:

	- 🟠 Test Generation — generate new tests for newly introduced behavior
	- 🟣 Test Update — repair or adapt outdated tests after code changes

	Each task is execution-grounded with runnable environments and evaluated using metrics such as pass rate, coverage, and mutation score.

	## Datasets

	- Test Generation — https://huggingface.co/datasets/TestEvo-Bench/teb-generation

	- Test Update — https://huggingface.co/datasets/TestEvo-Bench/teb-update

	## Links

	- 🌐 Website — https://www.testevo-bench.com/

	- 🤗 Hugging Face Space — https://huggingface.co/spaces/TestEvo-Bench/

	- 💻 Code — https://anonymous.4open.science/r/testevo-bench-1150/README.md


	---

	<div align="center">

	Real-world • Execution-grounded • Live software evolution benchmark

	</div>