MMSkills: Towards Multimodal Skills for General Visual Agents Paper • 2605.13527 • Published May 14 • 122
Turing Test on Screen: A Benchmark for Mobile GUI Agent Humanization Paper • 2604.09574 • Published Feb 24 • 30
CoreCodeBench Collection dataset for CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark • 2 items • Updated May 16, 2025
CoreCodeBench Collection dataset for CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark • 2 items • Updated May 16, 2025
CoreCodeBench Collection dataset for CoreCodeBench: A Configurable Multi-Scenario Repository-Level Benchmark • 2 items • Updated May 16, 2025