test(benchmarks): add black-box tests for evaluate_spider outputs and trace normalization d347376 Melika Kheirieh commited on Nov 3