Perry the Platypus's picture

Perry the Platypus PRO

AgPerry

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 5 days ago

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

upvoted a paper 10 days ago

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

upvoted a paper 10 days ago

Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback

View all activity

Organizations

upvoted a paper 5 days ago

Dr-DCI: Scaling Direct Corpus Interaction via Dynamic Workspace Expansion

Paper • 2606.14885 • Published 11 days ago • 11

upvoted 2 papers 10 days ago

ScientistOne: Towards Human-Level Autonomous Research via Chain-of-Evidence

Paper • 2605.26340 • Published 29 days ago • 36

Where, What, Why, and Importance: Structured Defect Grounding for Text-to-Image Feedback

Paper • 2606.06113 • Published 19 days ago • 15

updated a dataset 12 days ago

TIGER-Lab/ClawBench

Viewer • Updated 12 days ago • 283 • 534

upvoted a paper 19 days ago

MIRA: Mid-training Rubric Anchoring for Source-Aware Data Selection

Paper • 2605.30288 • Published 25 days ago • 23

updated a Space 28 days ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated 4 datasets 28 days ago

TIGER-Lab/ClawBenchV2Trace

Updated 28 days ago • 4.11k

NAIL-Group/ClawBenchV2Trace

Updated 28 days ago • 1.94k

NAIL-Group/ClawBenchV1Trace

Updated 28 days ago • 2.15k

NAIL-Group/ClawBench

Viewer • Updated 28 days ago • 153 • 287 • 2

commented a paper about 1 month ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10 •

upvoted a paper about 1 month ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10

New activity in huggingface/HuggingDiscussions about 1 month ago

[FEEDBACK] Daily Papers

#32 opened about 2 years ago by

submitted a paper to Daily Papers about 1 month ago

RewardHarness: Self-Evolving Agentic Post-Training

Paper • 2605.08703 • Published May 9 • 10

updated a collection about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12

published a Space about 1 month ago

ClawBench Leaderboard

Can AI agents complete everyday online tasks?

updated a collection about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12

updated a Space about 1 month ago

ClawBench Leaderboard

Live leaderboard for the ClawBench web-agent benchmark

updated 2 collections about 1 month ago

ClawBench

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12

ClawBench — Browser Agent Benchmark Suite

Benchmark dataset (V1+V2), live leaderboard Space, and full V1 execution traces — everything you need to run, regrade, or compare on ClawBench. • 5 items • Updated May 12 • 1