Fix Terminal-Bench task_id to terminalbench_2 24e2c73 verified burtenshaw HF Staff commited on 4 days ago
Add Terminal-Bench evaluation result (29.2%) d0c63cd verified burtenshaw HF Staff commited on 4 days ago