dzungpham
/

graphcodebert-code-classification

Model card Files Files and versions

graphcodebert-code-classification / README.md

dzungpham's picture

Update README.md

f6ef07a verified about 1 month ago

|

2.8 kB

	---
	license: mit
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	base_model:
	- microsoft/unixcoder-base
	library_name: transformers
	tags:
	- detection
	- AI-generated
	- transformers
	- bert
	---

	## Task Overview

	The rapid advancement of generative models has made it increasingly challenging to distinguish machine-generated code from human-written code, particularly across different programming languages, domains, and generation techniques.

	SemEval-2026 Task 13 focuses on developing systems capable of detecting machine-generated code under diverse conditions. The evaluation emphasizes generalization to unseen programming languages, generator families, and application scenarios.

	The task is divided into three subtasks.

	---

	### Subtask A: Binary Machine-Generated Code Detection

	Goal:
	Given a code snippet, determine whether it is:

	- Fully human-written, or
	- Fully machine-generated

	Training Languages: C++, Python, Java
	Training Domain: Algorithmic (e.g., LeetCode-style problems)

	Evaluation Settings:

	\| Setting \| Language \| Domain \|
	\|--------------------------------------\|-------------------------\|----------------------\|
	\| (i) Seen Languages & Seen Domains \| C++, Python, Java \| Algorithmic \|
	\| (ii) Unseen Languages & Seen Domains \| Go, PHP, C#, C, JS \| Algorithmic \|
	\| (iii) Seen Languages & Unseen Domains\| C++, Python, Java \| Research, Production \|
	\| (iv) Unseen Languages & Domains \| Go, PHP, C#, C, JS \| Research, Production \|

	Dataset Size:
	- Train: 500,000 samples (238,000 human-written, 262,000 machine-generated)
	- Validation: 100,000 samples

	Data Format:
	Each dataset includes the following fields:
	- `code`: The code snippet
	- `label`: Binary label (0 for human-written, 1 for machine-generated)
	- `language`: Programming language of the snippet

	Label mappings are provided in `task_A/label_to_id.json` and `task_A/id_to_label.json`.

	Evaluation Metric:
	The primary metric for Subtask A is Macro F1-score, ensuring balanced performance across both classes.

	Submission Format:
	Participants must submit a `.csv` file containing:
	- `id`: Unique identifier for each code snippet
	- `label`: Predicted label (0 or 1)

	A sample submission file is available in the `task_A/` directory.

	Baseline Models:
	Baseline implementations are provided in the `baselines/` directory, including starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.

	Restrictions:
	- No external training data may be used; only the provided datasets are allowed.
	- Specialized AI-generated code detectors are not permitted. General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.