DrafterBench: Benchmarking Large Language Models for Tasks Automation in Civil Engineering Paper • 2507.11527 • Published Jul 15, 2025 • 35
view article Article The Agent Era Is Here: A Comprehensive Survey of Large Language Model Agents Apr 8, 2025 • 3
TRAJECT-Bench:A Trajectory-Aware Benchmark for Evaluating Agentic Tool Use Paper • 2510.04550 • Published Oct 6, 2025 • 1
ToolDial: Multi-turn Dialogue Generation Method for Tool-Augmented Language Models Paper • 2503.00564 • Published Mar 1, 2025 • 2
ToolRM Collection ToolRM: Towards Agentic Tool-Use Reward Modeling • 6 items • Updated 12 days ago • 4
One Model to Critique Them All: Rewarding Agentic Tool-Use via Efficient Reasoning Paper • 2510.26167 • Published Oct 30, 2025 • 2
OrchDAG: Complex Tool Orchestration in Multi-Turn Interactions with Plan DAGs Paper • 2510.24663 • Published Oct 28, 2025 • 1
The Landscape of Emerging AI Agent Architectures for Reasoning, Planning, and Tool Calling: A Survey Paper • 2404.11584 • Published Apr 17, 2024 • 1
LoopTool: Closing the Data-Training Loop for Robust LLM Tool Calls Paper • 2511.09148 • Published Nov 12, 2025 • 18
ToolRM: Outcome Reward Models for Tool-Calling Large Language Models Paper • 2509.11963 • Published Sep 15, 2025 • 4
view article Article Introducing Trackio: A Lightweight Experiment Tracking Library from Hugging Face +3 Jul 29, 2025 • 211