Running
Vulnerability Agent Bench
🚀
Benchmark of coding models in patching vulnerabilies
Cybersecurity - Benchmarking and training data for Vulnerability Patching
XOR is a defensive cyber lab. We sell reinforcement learning environments and benchmark data to frontier labs and coding-tool companies. Their agents train against verifiable exploit targets and learn to find and fix vulnerabilities at production scale.
We test all current coding models on their ability to write correct patches against vulnerabilities. Results and methodology: https://www.xor.tech/resources/benchmarks