docs(evidence): add multi-agent eval, long-horizon trace, and constrained eval logs 9731ebe vx7sh commited on 25 days ago
feat(eval): add benchmark harness and reward integrity artifacts ee2f27b vx7sh commited on 26 days ago