Add post-pipeline quality checks + 4 eval scripts for capability parity b264511 jtlevine Claude Opus 4.6 (1M context) commited on Apr 14