dzungpham
/

graphcodebert-code-classification

Model card Files Files and versions

graphcodebert-code-classification / README.md

dzungpham's picture

Create README.md

b586b98 verified about 1 month ago

|

2.87 kB

	---
	license: mit
	datasets:
	- DaniilOr/SemEval-2026-Task13
	metrics:
	- accuracy
	- f1
	- precision
	- recall
	base_model:
	- microsoft/unixcoder-base
	library_name: transformers
	tags:
	- detection
	- AI-generated
	- transformers
	- bert
	---
	## 🔍 Task Overview

	The rise of generative models has made it increasingly difficult to distinguish machine-generated code from human-written code — especially across different programming languages, domains, and generation techniques.

	SemEval-2026 Task 13 challenges participants to build systems that can detect machine-generated code under diverse conditions by evaluating generalization to unseen languages, generator families, and code application scenarios.

	The task consists of three subtasks:

	---

	### Subtask A: Binary Machine-Generated Code Detection

	Goal:
	Given a code snippet, predict whether it is:

	- (i) Fully human-written, or
	- (ii) Fully machine-generated

	Training Languages: `C++`, `Python`, `Java`
	Training Domain: `Algorithmic` (e.g., Leetcode-style problems)

	Evaluation Settings:

	\| Setting \| Language \| Domain \|
	\|--------------------------------------\|-------------------------\|------------------------\|
	\| (i) Seen Languages & Seen Domains \| C++, Python, Java \| Algorithmic \|
	\| (ii) Unseen Languages & Seen Domains \| Go, PHP, C#, C, JS \| Algorithmic \|
	\| (iii) Seen Languages & Unseen Domains\| C++, Python, Java \| Research, Production \|
	\| (iv) Unseen Languages & Domains \| Go, PHP, C#, C, JS \| Research, Production \|

	Dataset Size:
	- Train - 500K samples (238K Human-Written \| 262K Machine-Generated)
	- Validation - 100K samples

	Data Format
	Each dataset contains the following fields:
	- `code`: The code snippet
	- `label`: The binary label (0 for human-written, 1 for machine-generated)
	- `language`: The programming language of the snippet

	Label mappings are provided in `task_A/label_to_id.json` and `task_A/id_to_label.json`.

	Evaluation Metric
	The primary evaluation metric for Subtask A is Macro F1-score. This metric ensures balanced performance across both classes.

	Submission Format
	Participants must submit a `.csv` file with the following columns:
	- `id`: Unique identifier for each code snippet
	- `label`: Predicted label (0 or 1)

	A sample submission file is available in the `task_A/` folder.

	Baseline Models
	Baseline implementations for Subtask A are provided in the `baselines/` directory. These include starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.

	Restrictions
	- No external training data: Use only the provided datasets.
	- No specialized AI-generated code detectors: General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.