xbench (xbench)

huxueyu

submitted 2 papers to Daily Papers 5 months ago

EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies

Paper • 2602.09514 • Published Feb 10 • 11

AgentIF-OneDay: A Task-level Instruction-Following Benchmark for General AI Agents in Daily Scenarios

Paper • 2601.20613 • Published Jan 28 • 10

Lucky2022

updated a dataset 5 months ago

xbench/AgentIF-OneDay

Viewer • Updated Jan 29 • 58 • 700 • 4

Lucky2022

published a dataset 5 months ago

xbench/AgentIF-OneDay

Viewer • Updated Jan 29 • 58 • 700 • 4

huxueyu

updated a dataset 5 months ago

xbench/AgentIF-OneDay

Viewer • Updated Jan 29 • 58 • 700 • 4

huxueyu

in xbench/AgentIF-OneDay 5 months ago

Update README.md

#8 opened 5 months ago by

huxueyu

in xbench/AgentIF-OneDay 6 months ago

Update README.md

#7 opened 6 months ago by

huxueyu

Create README.md

#6 opened 6 months ago by

huxueyu

Delete README.md

#5 opened 6 months ago by

huxueyu

Upload data.jsonl

#4 opened 6 months ago by

huxueyu

Upload 132 files

#3 opened 6 months ago by

huxueyu

Upload 132 files

#2 opened 6 months ago by

huxueyu

Upload data.jsonl

#1 opened 6 months ago by

huxueyu

Lucky2022

authored a paper 7 months ago

Virtual Width Networks

Paper • 2511.11238 • Published Nov 14, 2025 • 39

lyangpku

published a dataset 8 months ago

xbench/DeepSearch-2510

Viewer • Updated Oct 24, 2025 • 100 • 1.33k • 3

lyangpku

updated a dataset 8 months ago

xbench/DeepSearch-2510

Viewer • Updated Oct 24, 2025 • 100 • 1.33k • 3

Lucky2022

authored a paper about 1 year ago

xbench: Tracking Agents Productivity Scaling with Profession-Aligned Real-World Evaluations

Paper • 2506.13651 • Published Jun 16, 2025 • 8

lyangpku

updated 2 datasets about 1 year ago

xbench/ScienceQA

Viewer • Updated Jun 18, 2025 • 100 • 37 • 8

xbench/DeepSearch

Viewer • Updated Jun 18, 2025 • 100 • 2.07k • 12

lyangpku

published a dataset about 1 year ago

xbench/DeepSearch

Viewer • Updated Jun 18, 2025 • 100 • 2.07k • 12

AI & ML interests

Team members 5

xbench's activity

Update README.md

Update README.md

Create README.md

Delete README.md

Upload data.jsonl

Upload 132 files

Upload 132 files

Upload data.jsonl