Spaces:
Running
Running
| title: README | |
| emoji: π» | |
| colorFrom: pink | |
| colorTo: blue | |
| sdk: static | |
| pinned: false | |
| # FPEval: A Holistic Evaluation Framework for Functional Programming | |
| Built upon **FPBench**, the framework includes **721 programming tasks** across three difficulty levels. It focuses on three mainstream FP languages (**Haskell, OCaml, and Scala**) and utilizes **Java** as an imperative baseline for comparative analysis. | |
| --- | |
| ## π Dataset Structure | |
| The dataset is organized into two primary components: | |
| ### 1. `LeetCodeProblem/` | |
| This directory contains the core problem specifications and testing infrastructure. | |
| * **Data Collection Period:** 2021 β March 2025. | |
| * **`meta.json`**: A comprehensive file for each problem containing: | |
| * Problem descriptions and I/O constraints. | |
| * Public test cases. | |
| * **Private Test Cases**: High-coverage inputs/outputs synthetically generated using **GPT-4o**. | |
| * **Test Templates**: Pre-configured templates to ensure seamless evaluation: | |
| * `Main.hs` (Haskell) | |
| * `main.ml` (OCaml) | |
| * `MySuite.scala` (Scala) | |
| ### 2. `LLMsGeneratedCode/` | |
| This directory stores outputs from **GPT-3.5-turbo**, **GPT-4o**, and **GPT-5**, categorized into three distinct refinement stages: | |
| | Stage | Description | | |
| | :--- | :--- | | |
| | **`CodeGenerated`** | Initial zero-shot code outputs. | | |
| | **`BaselineRepair`** | Code refined based on old code. | | |
| | **`InstructionRepair`** | Code optimized using static analysis feedback and hand-crafted idiomatic instructions. | | |
| --- | |
| ## π Static Analysis & Repair Tools | |
| FPEval utilizes industry-standard tools to provide feedback during the `InstructionRepair` stage, ensuring the generated code adheres to idiomatic patterns: | |
| * **Haskell**: `HLint` (style/idioms) and `GHC` warnings (correctness). | |
| * **OCaml**: `dune` (build validation) and `ocamlformat` (formatting). | |
| * **Scala**: `Scalastyle` (functional style enforcement). | |
| * **Java**: `Checkstyle` and `PMD` (standard imperative quality checks). | |
| --- | |
| ## π Directory Layout | |
| ```text | |
| . | |
| βββ LeetCodeProblem/ | |
| β βββ [Problem_NAME]/ | |
| β β βββ meta.json # Core problem data & private tests | |
| β β βββ Main.hs # Haskell test template | |
| β β βββ main.ml # OCaml test template | |
| β β βββ MySuite.scala # Scala test template | |
| βββ LLMsGeneratedCode/ | |
| β βββ [Model_Name]/ # gpt-3.5-turbo, gpt-4o, gpt-5 | |
| β β βββ CodeGenerated/ # Initial raw outputs | |
| β β βββ BaselineRepair/ # Post-compiler feedback attempts | |
| β β βββ InstructionRepair/ # Post-static analysis refinement | |
| βββ README.md | |