Buckets:

edm-research
/

PROBE-bucket

Files

xet

edm-research/PROBE-bucket / README.md

edm-research

5 days ago

preview code

download

raw

3.9 kB

	---
	license: cc-by-sa-4.0
	language:
	- en
	tags:
	- code
	pretty_name: PROBE
	size_categories:
	- 1K<n<10K
	---

	# PROBE Dataset

	The dataset is provided as a single JSONL file: `dataset.jsonl`

	---

	## Dataset structure

	Each line in the file corresponds to one programming problem and contains a JSON object with the following fields:

	problem_id: A unique identifier for the problem.

	difficulty_level Indication of the difficulty level of the task. Values range from 0 to 3, where 0 is the easiest and 3 the hardest. Difficulty was estimated based on the cyclomatic complexity, LLOC, and halstead effort of the Python reference solutions.

	prompt: The natural language description of the programming task.

	unit_tests: A list of unit test specifications associated with the problem. Each unit test is an object with the following fields:
	- number: unit test identifier.
	- input: the input provided to the program.
	- output: the expected output for the given input.

	references: A list of reference solutions for the problem. Each reference solution is an object with the following fields:
	- language: the programming language of the solution (e.g., Python, C++, Java, C, Rust).
	- id: an identifier for the reference solution.
	- code: the source code implementing a correct solution for the problem.

	---

	## Dataset Statistics

	- Total problems: 1,651
	- Reference solutions per problem:
	- Python, C++: 3–250
	- Java, C: 0–250
	- Rust: 0–180
	- Unit tests per problem: 6–131

	---

	## Data Sources

	This dataset is based on the [Project CodeNet](https://github.com/IBM/Project_CodeNet) dataset, which contains problems from two online judge platforms: Aizu and AtCoder.

	- Prompts:
	Extracted from the HTML files containing problem descriptions and organized into a structured format:

	```
	Problem Description:
	Input Format:
	Output Format:
	Constraints:
	```


	- Reference solutions:
	Filtered to keep only correct solutions. For each problem, a random subset was selected, with a maximum of 250 reference solutions per problem.

	- Unit tests:
	Most unit tests were obtained directly from the online judge platforms. Additional tests were generated using the available reference solutions to ensure coverage.

	---

	# Generated code

	The zip file `generated_code.zip` contains LLM-generated solutions for these problems.

	The solutions where generated by six different models:

	- GPT-4.1-mini
	- Gemini-2.0-Flash
	- Deepseek-Coder-v2 (16b)
	- Qwen2.5-Coder ((14b)
	- Qwen2.5-Coder (7/14b)

	Each model generated five independent solutions for each problem.

	The process was repeated for five programming languages (Python, C++, Java, C, Rust),

	When the solutions were incorrect, the models were given feedback (up to two iterations) and asked to provide a new solution:

	- solution.{termination} (py, cpp, java, c, rs) - first solution generated by the model (before any feedback),
	- solution_0/1.{termination} - first and second solutions generated after feedback.

	---

	# Intended Use

	This dataset is intended for research and evaluation of Large Language Models in the task of text-to-code generation.

	The presence of both large-scale unit tests and multiple reference implementations enables comprehensive functional correctness evaluation as well as comparison against human-written solutions. Reference solutions are provided in five programming languages, allowing cross-language analysis and benchmarking of multilingual code generation capabilities.

	The dataset supports:

	- Functional correctness evaluation using extensive unit testing.
	- Similarity analysis to human-written implementations, supporting metrics such as syntactic, semantic, or structural similarity.
	- Code quality assessment, both for comparing different models and for evaluating generated code relative to high-quality human reference implementations.

Xet Storage Details

Size:: 3.9 kB
Xet hash:: 24f109e39a659293960f0043adb95e748c67692b77a08f1932faaf3172152e0e

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.