FPEvalRepoPublic/Result4Statistic
Updated ⢠8.45k
None defined yet.
Built upon FPBench, the framework includes 721 programming tasks across three difficulty levels. It focuses on three mainstream FP languages (Haskell, OCaml, and Scala) and utilizes Java as an imperative baseline for comparative analysis.
The dataset is organized into two primary components:
LeetCodeProblem/This directory contains the core problem specifications and testing infrastructure.
meta.json: A comprehensive file for each problem containing:Main.hs (Haskell)main.ml (OCaml)MySuite.scala (Scala)LLMsGeneratedCode/This directory stores outputs from GPT-3.5-turbo, GPT-4o, and GPT-5, categorized into three distinct refinement stages:
| Stage | Description |
|---|---|
CodeGenerated |
Initial zero-shot code outputs. |
BaselineRepair |
Code refined based on old code. |
InstructionRepair |
Code optimized using static analysis feedback and hand-crafted idiomatic instructions. |
FPEval utilizes industry-standard tools to provide feedback during the InstructionRepair stage, ensuring the generated code adheres to idiomatic patterns:
HLint (style/idioms) and GHC warnings (correctness).dune (build validation) and ocamlformat (formatting).Scalastyle (functional style enforcement).Checkstyle and PMD (standard imperative quality checks)..
āāā LeetCodeProblem/
ā āāā [Problem_NAME]/
ā ā āāā meta.json # Core problem data & private tests
ā ā āāā Main.hs # Haskell test template
ā ā āāā main.ml # OCaml test template
ā ā āāā MySuite.scala # Scala test template
āāā LLMsGeneratedCode/
ā āāā [Model_Name]/ # gpt-3.5-turbo, gpt-4o, gpt-5
ā ā āāā CodeGenerated/ # Initial raw outputs
ā ā āāā BaselineRepair/ # Post-compiler feedback attempts
ā ā āāā InstructionRepair/ # Post-static analysis refinement
āāā README.md