NL2Repo-Bench: Towards Long-Horizon Repository Generation Evaluation of Coding Agents Paper • 2512.12730 • Published Dec 14, 2025 • 44
Finch: Benchmarking Finance & Accounting across Spreadsheet-Centric Enterprise Workflows Paper • 2512.13168 • Published Dec 15, 2025 • 50
WebOperator: Action-Aware Tree Search for Autonomous Agents in Web Environment Paper • 2512.12692 • Published Dec 14, 2025 • 14
Nemotron-Pre-Training-Datasets Collection Large scale pre-training datasets used in the Nemotron family of models. • 11 items • Updated 6 days ago • 94