Thomas Broadley's picture

Thomas Broadley

tbroadley

·

AI & ML interests

None yet

Recent Activity

updated a dataset 4 days ago

metr-evals/apps-with-input-validation

new activity 6 days ago

metr-evals/apps-with-input-validation:Unify verification scripts into verify.py, add AGENTS.md

new activity 6 days ago

metr-evals/apps-with-input-validation:Fix trailing empty strings in 94 train strs samples

View all activity

Organizations

updated a dataset 4 days ago

metr-evals/apps-with-input-validation

Preview • Updated 4 days ago • 191

New activity in metr-evals/apps-with-input-validation 6 days ago

Unify verification scripts into verify.py, add AGENTS.md

#12 opened 6 days ago by

Fix trailing empty strings in 94 train strs samples

#11 opened 6 days ago by

Fix trailing spaces in expected test outputs (89 samples across train and test)

#10 opened 6 days ago by

Fix trailing spaces in expected test outputs (95 samples across train and test)

#8 opened 6 days ago by

Re-apply fixes from PR #2 and PR #3 that were accidentally reverted by PR #5

#9 opened 6 days ago by

Fix trailing spaces in test output for sample 3341

#7 opened 6 days ago by

New activity in metr-evals/apps-with-input-validation 9 days ago

Remove counterexample sample (id=737) from train split

#6 opened 9 days ago by

New activity in metr-evals/apps-with-input-validation about 1 month ago

Fix expected outputs to match golden solutions

#5 opened about 1 month ago by

Fix expected outputs to match golden solutions

#4 opened about 1 month ago by

Remove 615 problems with ambiguous outputs

#3 opened about 1 month ago by

New activity in metr-evals/apps-with-input-validation about 2 months ago

Fix test set: whitespace issues, dependency errors, and Python 3.11+ compatibility

#2 opened about 2 months ago by

Fix whitespace discrepancies in expected test outputs

#1 opened about 2 months ago by

upvoted a paper 3 months ago

Measuring AI Ability to Complete Long Tasks

Paper • 2503.14499 • Published Mar 18, 2025 • 16

authored a paper 12 months ago

Measuring AI Ability to Complete Long Tasks

Paper • 2503.14499 • Published Mar 18, 2025 • 16