A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks Paper • 2605.28556 • Published 7 days ago • 55
On the Scaling of PEFT: Towards Million Personal Models of Trillion Parameters Paper • 2606.02437 • Published 2 days ago • 88
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Paper • 2605.30611 • Published 6 days ago • 112
atul10/prompt_reverse_engineering_code_dataset_O2_arm_O2_advanced_custom_test Viewer • Updated Aug 4, 2025 • 30.1k • 24 • 1