Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
12
1
Thomas Broadley
tbroadley
Follow
0 followers
·
1 following
AI & ML interests
None yet
Recent Activity
updated
a dataset
4 days ago
metr-evals/apps-with-input-validation
new
activity
6 days ago
metr-evals/apps-with-input-validation:
Unify verification scripts into verify.py, add AGENTS.md
new
activity
6 days ago
metr-evals/apps-with-input-validation:
Fix trailing empty strings in 94 train strs samples
View all activity
Organizations
tbroadley
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
updated
a dataset
4 days ago
metr-evals/apps-with-input-validation
Preview
•
Updated
4 days ago
•
191
New activity in
metr-evals/apps-with-input-validation
6 days ago
Unify verification scripts into verify.py, add AGENTS.md
#12 opened 6 days ago by
tbroadley
Fix trailing empty strings in 94 train strs samples
#11 opened 6 days ago by
tbroadley
Fix trailing spaces in expected test outputs (89 samples across train and test)
#10 opened 6 days ago by
tbroadley
Fix trailing spaces in expected test outputs (95 samples across train and test)
1
#8 opened 6 days ago by
tbroadley
Re-apply fixes from PR #2 and PR #3 that were accidentally reverted by PR #5
#9 opened 6 days ago by
tbroadley
Fix trailing spaces in test output for sample 3341
2
#7 opened 6 days ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
9 days ago
Remove counterexample sample (id=737) from train split
#6 opened 9 days ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
about 1 month ago
Fix expected outputs to match golden solutions
#5 opened about 1 month ago by
tbroadley
Fix expected outputs to match golden solutions
1
#4 opened about 1 month ago by
tbroadley
Remove 615 problems with ambiguous outputs
#3 opened about 1 month ago by
tbroadley
New activity in
metr-evals/apps-with-input-validation
about 2 months ago
Fix test set: whitespace issues, dependency errors, and Python 3.11+ compatibility
#2 opened about 2 months ago by
tbroadley
Fix whitespace discrepancies in expected test outputs
#1 opened about 2 months ago by
tbroadley
upvoted
a
paper
3 months ago
Measuring AI Ability to Complete Long Tasks
Paper
•
2503.14499
•
Published
Mar 18, 2025
•
16
authored
a paper
12 months ago
Measuring AI Ability to Complete Long Tasks
Paper
•
2503.14499
•
Published
Mar 18, 2025
•
16