Spaces:
Running
Running
metadata
title: README
emoji: π»
colorFrom: pink
colorTo: blue
sdk: static
pinned: false
FPEval: A Holistic Evaluation Framework for Functional Programming
Built upon FPBench, the framework includes 721 programming tasks across three difficulty levels. It focuses on three mainstream FP languages (Haskell, OCaml, and Scala) and utilizes Java as an imperative baseline for comparative analysis.
π Dataset Structure
The dataset is organized into two primary components:
1. LeetCodeProblem/
This directory contains the core problem specifications and testing infrastructure.
- Data Collection Period: 2021 β March 2025.
meta.json: A comprehensive file for each problem containing:- Problem descriptions and I/O constraints.
- Public test cases.
- Private Test Cases: High-coverage inputs/outputs synthetically generated using GPT-4o.
- Test Templates: Pre-configured templates to ensure seamless evaluation:
Main.hs(Haskell)main.ml(OCaml)MySuite.scala(Scala)
2. LLMsGeneratedCode/
This directory stores outputs from GPT-3.5-turbo, GPT-4o, and GPT-5, categorized into three distinct refinement stages:
| Stage | Description |
|---|---|
CodeGenerated |
Initial zero-shot code outputs. |
BaselineRepair |
Code refined based on old code. |
InstructionRepair |
Code optimized using static analysis feedback and hand-crafted idiomatic instructions. |
π Static Analysis & Repair Tools
FPEval utilizes industry-standard tools to provide feedback during the InstructionRepair stage, ensuring the generated code adheres to idiomatic patterns:
- Haskell:
HLint(style/idioms) andGHCwarnings (correctness). - OCaml:
dune(build validation) andocamlformat(formatting). - Scala:
Scalastyle(functional style enforcement). - Java:
CheckstyleandPMD(standard imperative quality checks).
π Directory Layout
.
βββ LeetCodeProblem/
β βββ [Problem_NAME]/
β β βββ meta.json # Core problem data & private tests
β β βββ Main.hs # Haskell test template
β β βββ main.ml # OCaml test template
β β βββ MySuite.scala # Scala test template
βββ LLMsGeneratedCode/
β βββ [Model_Name]/ # gpt-3.5-turbo, gpt-4o, gpt-5
β β βββ CodeGenerated/ # Initial raw outputs
β β βββ BaselineRepair/ # Post-compiler feedback attempts
β β βββ InstructionRepair/ # Post-static analysis refinement
βββ README.md