Batch Validation — 5 Cascade-Only Instances
Job: 6a04d3a33308d79117b8f24c Status: RUNNING Started: 2026-05-13
Instances
These are the cascade-only instances from the corrected report — solved by T1 or T2 but NOT by either T4 model:
django__django-11815django__django-13089django__django-13807django__django-14315(single-instance test: T2 produced valid 3997ch diff)matplotlib__matplotlib-25224
Approach
- T1: Llama-3.1-8B-Instruct (25 turn max)
- T2: Llama-3.3-70B-Instruct (20 turn max)
- Protocol: File editing (
<edit path='file'>content</edit>) +git diffexternal - Validation:
git apply --checkon generated patch - Cost: $0 inference (HF free) + ~$2-3 compute
Expected Result
If the cascade thesis holds, we should see:
- 3-5 instances producing valid patches
- Mix of T1 and T2 solves
- Evidence that cheap models can solve instances that frontier models miss
Check Logs
curl -s "https://huggingface.co/api/jobs/narcolepticchicken/6a04d3a33308d79117b8f24c/logs"