Ge Zhang

zhangysk

AI & ML interests

None yet

Recent Activity

upvoted a paper 16 days ago

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

updated a collection about 1 month ago

OProver

upvoted a paper about 1 month ago

OProver: A Unified Framework for Agentic Formal Theorem Proving

View all activity

Organizations

upvoted a paper 16 days ago

Workflow-GYM: Towards Long-Horizon Evaluation of Computer-use Agentic tasks in Real-World Professional Fields

Paper • 2606.11042 • Published 17 days ago • 21

updated a collection about 1 month ago

OProver

Collection

9 items • Updated May 19 • 3

upvoted a paper about 1 month ago

OProver: A Unified Framework for Agentic Formal Theorem Proving

Paper • 2605.17283 • Published May 17 • 31

upvoted 2 papers 3 months ago

InCoder-32B: Code Foundation Model for Industrial Scenarios

Paper • 2603.16790 • Published Mar 17 • 312

Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining

Paper • 2603.11103 • Published Mar 11 • 9

authored 4 papers 3 months ago

VeRA: Verified Reasoning Data Augmentation at Scale

Paper • 2602.13217 • Published Jan 23

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Paper • 2602.22675 • Published Feb 26 • 23

\$OneMillion-Bench: How Far are Language Agents from Human Experts?

Paper • 2603.07980 • Published Mar 9 • 27

Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining

Paper • 2603.11103 • Published Mar 11 • 9

upvoted a paper 4 months ago

Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization

Paper • 2602.22675 • Published Feb 26 • 23

upvoted a paper 5 months ago

VideoWorld 2: Learning Transferable Knowledge from Real-world Videos

Paper • 2602.10102 • Published Feb 10 • 14

liked a dataset 5 months ago

m-a-p/Retrieval-Infused-Reasoning-Sandbox

Viewer • Updated Feb 3 • 300 • 44 • 3

authored 3 papers 5 months ago

BABE: Biology Arena BEnchmark

Paper • 2602.05857 • Published Feb 5 • 10

Context Forcing: Consistent Autoregressive Video Generation with Long Context

Paper • 2602.06028 • Published Feb 5 • 36

Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities

Paper • 2601.21937 • Published Jan 29 • 20

upvoted 5 papers 5 months ago