PatRe: A Full-Stage Office Action and Rebuttal Generation Benchmark for Patent Examination Paper • 2605.03571 • Published 2 days ago • 5
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation? Paper • 2604.27419 • Published 7 days ago • 12
InteractWeb-Bench: Can Multimodal Agent Escape Blind Execution in Interactive Website Generation? Paper • 2604.27419 • Published 7 days ago • 12
Beyond Quantity: Trajectory Diversity Scaling for Code Agents Paper • 2602.03219 • Published Feb 3 • 2
FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration Paper • 2603.29557 • Published Mar 31 • 17
FlowPIE: Test-Time Scientific Idea Evolution with Flow-Guided Literature Exploration Paper • 2603.29557 • Published Mar 31 • 17
IPBench: Benchmarking the Knowledge of Large Language Models in Intellectual Property Paper • 2504.15524 • Published Apr 22, 2025 • 3
SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20, 2025 • 110
AutoPatent: A Multi-Agent Framework for Automatic Patent Generation Paper • 2412.09796 • Published Dec 13, 2024 • 3
IPEval: A Bilingual Intellectual Property Agency Consultation Evaluation Benchmark for Large Language Models Paper • 2406.12386 • Published Jun 18, 2024 • 1