Spaces:

FPEvalRepoPublic
/

README

Running

App Files Files Community

DataRepo commited on Jan 19

Commit

68ef960

verified ·

1 Parent(s): 47ccf08

Update README.md

Browse files

Files changed (1) hide show

README.md +59 -0

README.md CHANGED Viewed

@@ -8,3 +8,62 @@ pinned: false
 ---
 Edit this `README.md` markdown file to author your organization card.

 ---
 Edit this `README.md` markdown file to author your organization card.
+# FPEval: A Holistic Evaluation Framework for Functional Programming
+Built upon **FPBench**, the framework includes **721 programming tasks** across three difficulty levels. It focuses on three mainstream FP languages (**Haskell, OCaml, and Scala**) and utilizes **Java** as an imperative baseline for comparative analysis.
+---
+## Dataset Structure
+The dataset is organized into two primary components:
+### 1. `LeetCodeProblem/`
+This directory contains the core problem specifications and testing infrastructure.
+* **Data Collection Period:** 2021 – March 2025.
+* **`meta.json`**: A comprehensive file for each problem containing:
+    * Problem descriptions and I/O constraints.
+    * Public test cases.
+    * **Private Test Cases**: High-coverage inputs/outputs synthetically generated using **GPT-4o**.
+* **Test Templates**: Pre-configured templates to ensure seamless evaluation:
+    * `Main.hs` (Haskell)
+    * `main.ml` (OCaml)
+    * `MySuite.scala` (Scala)
+### 2. `LLMsGeneratedCode/`
+This directory stores outputs from **GPT-3.5-turbo**, **GPT-4o**, and **GPT-5**, categorized into three distinct refinement stages:
+| Stage | Description |
+| :--- | :--- |
+| **`CodeGenerated`** | Initial zero-shot code outputs. |
+| **`BaselineRepair`** | Code refined based on old code. |
+| **`InstructionRepair`** | Code optimized using static analysis feedback and hand-crafted idiomatic instructions. |
+---
+## Static Analysis & Repair Tools
+FPEval utilizes industry-standard tools to provide feedback during the `InstructionRepair` stage, ensuring the generated code adheres to idiomatic patterns:
+* **Haskell**: `HLint` (style/idioms) and `GHC` warnings (correctness).
+* **OCaml**: `dune` (build validation) and `ocamlformat` (formatting).
+* **Scala**: `Scalastyle` (functional style enforcement).
+* **Java**: `Checkstyle` and `PMD` (standard imperative quality checks).
+---
+## 📂 Directory Layout
+```text
+.
+├── LeetCodeProblem/
+│   ├── [Problem_NAME]/
+│   │   ├── meta.json             # Core problem data & private tests
+│   │   ├── Main.hs               # Haskell test template
+│   │   ├── main.ml               # OCaml test template
+│   │   └── MySuite.scala         # Scala test template
+├── LLMsGeneratedCode/
+│   ├── [Model_Name]/             # gpt-3.5-turbo, gpt-4o, gpt-5
+│   │   ├── CodeGenerated/        # Initial raw outputs
+│   │   ├── BaselineRepair/       # Post-compiler feedback attempts
+│   │   └── InstructionRepair/    # Post-static analysis refinement
+└── README.md