See /app/occ/benchmarks/benchmark_code_real_llm.py