ProgramBench

university

https://programbench.com

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

john-b-yang authored a paper about 1 month ago

SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?

john-b-yang authored a paper about 1 month ago

OpenThoughts: Data Recipes for Reasoning Models

john-b-yang authored a paper about 1 month ago

LongCodeBench: Evaluating Coding LLMs at 1M Context Windows

View all activity

Organization Card

Community About org cards

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Given only a compiled binary and its documentation, AI agents must architect and implement a complete codebase that reproduces the original program's behavior. ProgramBench evaluates this capability across 200 real-world open-source projects spanning Rust, Go, C, C++, Haskell, and Java.

ProgramBench

AI & ML interests

Recent Activity

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Links

models 0

datasets 1

programbench/ProgramBench-Tests

AI & ML interests

Recent Activity

Team members 5

ProgramBench: Can Language Models Rebuild Programs From Scratch?

Links

models 0

datasets 1