# Batch Validation — 5 Cascade-Only Instances **Job:** [6a04d3a33308d79117b8f24c](https://huggingface.co/jobs/narcolepticchicken/6a04d3a33308d79117b8f24c) **Status:** RUNNING **Started:** 2026-05-13 ## Instances These are the cascade-only instances from the corrected report — solved by T1 or T2 but NOT by either T4 model: 1. `django__django-11815` 2. `django__django-13089` 3. `django__django-13807` 4. `django__django-14315` (single-instance test: T2 produced valid 3997ch diff) 5. `matplotlib__matplotlib-25224` ## Approach - **T1:** Llama-3.1-8B-Instruct (25 turn max) - **T2:** Llama-3.3-70B-Instruct (20 turn max) - **Protocol:** File editing (`content`) + `git diff` external - **Validation:** `git apply --check` on generated patch - **Cost:** $0 inference (HF free) + ~$2-3 compute ## Expected Result If the cascade thesis holds, we should see: - 3-5 instances producing valid patches - Mix of T1 and T2 solves - Evidence that cheap models can solve instances that frontier models miss ## Check Logs ```bash curl -s "https://huggingface.co/api/jobs/narcolepticchicken/6a04d3a33308d79117b8f24c/logs" ```