Jeff

JiayuJeff

6 26 5

JiayuJeff

AI & ML interests

None yet

Recent Activity

upvoted a paper about 15 hours ago

Trimming the Long-Tail of Visual World Modeling Evaluation

upvoted a paper about 23 hours ago

GBC: Gradient-Based Connections for Optimizing Multi-Agent Systems

updated a dataset 2 days ago

JiayuJeff/PlanBench-XL

View all activity

Organizations

None yet

upvoted a paper about 15 hours ago

Trimming the Long-Tail of Visual World Modeling Evaluation

Paper • 2606.24256 • Published 8 days ago • 34

upvoted a paper about 23 hours ago

GBC: Gradient-Based Connections for Optimizing Multi-Agent Systems

Paper • 2606.28187 • Published 5 days ago • 10

upvoted 2 papers 7 days ago

BioInsight: Multi-Agent Orchestration for Interactive Biomedical Knowledge Discovery

Paper • 2606.20997 • Published 12 days ago • 3

PlanBench-XL: Evaluating Long-Horizon Planning of LLM Tool-Use Agents in Large-Scale Tool Ecosystems

Paper • 2606.22388 • Published 10 days ago • 95

upvoted a collection 7 days ago

awesome-agentic-benchmarks

Collection

3 items • Updated 8 days ago • 2

upvoted a paper 8 days ago

GeoBrowse: A Geolocation Benchmark for Agentic Tool Use with Expert-Annotated Reasoning Traces

Paper • 2604.04017 • Published Apr 5 • 8

upvoted a paper 25 days ago

Brick-Composer: Using MLLMs for Assembly with Diverse Bricks

Paper • 2606.05445 • Published 28 days ago • 8

upvoted a paper 26 days ago

AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints

Paper • 2606.05622 • Published 27 days ago • 44

upvoted a paper 28 days ago

Ψ-Bench: Evaluating Persona-Sensitive Influencing in Persuasive Dialogues

Paper • 2606.02754 • Published 29 days ago • 13

upvoted 3 papers about 1 month ago

AgentDoG 1.5: A Lightweight and Scalable Alignment Framework for AI Agent Safety and Security

Paper • 2605.29801 • Published May 28 • 144

Advancing Creative Physical Intelligence in Large Multimodal Models

Paper • 2605.26396 • Published May 25 • 21

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

Paper • 2605.14038 • Published May 13 • 15

upvoted a paper about 2 months ago

CreativityBench: Evaluating Agent Creative Reasoning via Affordance-Based Tool Repurposing

Paper • 2605.02910 • Published May 6 • 23

upvoted a paper 4 months ago

Code2Math: Can Your Code Agent Effectively Evolve Math Problems Through Exploration?

Paper • 2603.03202 • Published Mar 3 • 18

upvoted a collection 4 months ago

Qwen3.5

Collection

21 items • Updated Mar 9 • 1.7k

upvoted 5 papers 5 months ago

Jeff

AI & ML interests

Recent Activity

Organizations

JiayuJeff's activity