--- title: README emoji: 💻 colorFrom: pink colorTo: blue sdk: static pinned: false --- # FPEval: A Holistic Evaluation Framework for Functional Programming Built upon **FPBench**, the framework includes **721 programming tasks** across three difficulty levels. It focuses on three mainstream FP languages (**Haskell, OCaml, and Scala**) and utilizes **Java** as an imperative baseline for comparative analysis. --- ## 📊 Dataset Structure The dataset is organized into two primary components: ### 1. `LeetCodeProblem/` This directory contains the core problem specifications and testing infrastructure. * **Data Collection Period:** 2021 – March 2025. * **`meta.json`**: A comprehensive file for each problem containing: * Problem descriptions and I/O constraints. * Public test cases. * **Private Test Cases**: High-coverage inputs/outputs synthetically generated using **GPT-4o**. * **Test Templates**: Pre-configured templates to ensure seamless evaluation: * `Main.hs` (Haskell) * `main.ml` (OCaml) * `MySuite.scala` (Scala) ### 2. `LLMsGeneratedCode/` This directory stores outputs from **GPT-3.5-turbo**, **GPT-4o**, and **GPT-5**, categorized into three distinct refinement stages: | Stage | Description | | :--- | :--- | | **`CodeGenerated`** | Initial zero-shot code outputs. | | **`BaselineRepair`** | Code refined based on old code. | | **`InstructionRepair`** | Code optimized using static analysis feedback and hand-crafted idiomatic instructions. | --- ## 🛠 Static Analysis & Repair Tools FPEval utilizes industry-standard tools to provide feedback during the `InstructionRepair` stage, ensuring the generated code adheres to idiomatic patterns: * **Haskell**: `HLint` (style/idioms) and `GHC` warnings (correctness). * **OCaml**: `dune` (build validation) and `ocamlformat` (formatting). * **Scala**: `Scalastyle` (functional style enforcement). * **Java**: `Checkstyle` and `PMD` (standard imperative quality checks). --- ## 📂 Directory Layout ```text . ├── LeetCodeProblem/ │ ├── [Problem_NAME]/ │ │ ├── meta.json # Core problem data & private tests │ │ ├── Main.hs # Haskell test template │ │ ├── main.ml # OCaml test template │ │ └── MySuite.scala # Scala test template ├── LLMsGeneratedCode/ │ ├── [Model_Name]/ # gpt-3.5-turbo, gpt-4o, gpt-5 │ │ ├── CodeGenerated/ # Initial raw outputs │ │ ├── BaselineRepair/ # Post-compiler feedback attempts │ │ └── InstructionRepair/ # Post-static analysis refinement └── README.md