dzungpham's picture
Update README.md
d2240b9 verified
|
raw
history blame
2.8 kB
---
license: mit
metrics:
- accuracy
- f1
- precision
- recall
base_model:
- microsoft/unixcoder-base
library_name: transformers
tags:
- detection
- AI-generated
- transformers
- bert
---

## Task Overview

The rapid advancement of generative models has made it increasingly challenging to distinguish machine-generated code from human-written code, particularly across different programming languages, domains, and generation techniques.

SemEval-2026 Task 13 focuses on developing systems capable of detecting machine-generated code under diverse conditions. The evaluation emphasizes generalization to unseen programming languages, generator families, and application scenarios.

The task is divided into three subtasks.

---

### Subtask A: Binary Machine-Generated Code Detection

**Goal:**  
Given a code snippet, determine whether it is:

- Fully human-written, or  
- Fully machine-generated

**Training Languages:** C++, Python, Java  
**Training Domain:** Algorithmic (e.g., LeetCode-style problems)

**Evaluation Settings:**

| Setting                              | Language                | Domain               |
|--------------------------------------|-------------------------|----------------------|
| (i) Seen Languages & Seen Domains    | C++, Python, Java       | Algorithmic          |
| (ii) Unseen Languages & Seen Domains | Go, PHP, C#, C, JS      | Algorithmic          |
| (iii) Seen Languages & Unseen Domains| C++, Python, Java       | Research, Production |
| (iv) Unseen Languages & Domains      | Go, PHP, C#, C, JS      | Research, Production |

**Dataset Size:** 
- Train: 500,000 samples (238,000 human-written, 262,000 machine-generated)
- Validation: 100,000 samples

**Data Format:**  
Each dataset includes the following fields:
- `code`: The code snippet  
- `label`: Binary label (0 for human-written, 1 for machine-generated)  
- `language`: Programming language of the snippet  

Label mappings are provided in `task_A/label_to_id.json` and `task_A/id_to_label.json`.

**Evaluation Metric:**  
The primary metric for Subtask A is Macro F1-score, ensuring balanced performance across both classes.

**Submission Format:**  
Participants must submit a `.csv` file containing:
- `id`: Unique identifier for each code snippet  
- `label`: Predicted label (0 or 1)  

A sample submission file is available in the `task_A/` directory.

**Baseline Models:**  
Baseline implementations are provided in the `baselines/` directory, including starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.

**Restrictions:**
- No external training data may be used; only the provided datasets are allowed.  
- Specialized AI-generated code detectors are not permitted. General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.