π Task Overview
The rise of generative models has made it increasingly difficult to distinguish machine-generated code from human-written code β especially across different programming languages, domains, and generation techniques.
SemEval-2026 Task 13 challenges participants to build systems that can detect machine-generated code under diverse conditions by evaluating generalization to unseen languages, generator families, and code application scenarios.
The task consists of three subtasks:
Subtask A: Binary Machine-Generated Code Detection
Goal:
Given a code snippet, predict whether it is:
- (i) Fully human-written, or
- (ii) Fully machine-generated
Training Languages: C++, Python, Java
Training Domain: Algorithmic (e.g., Leetcode-style problems)
Evaluation Settings:
| Setting | Language | Domain |
|---|---|---|
| (i) Seen Languages & Seen Domains | C++, Python, Java | Algorithmic |
| (ii) Unseen Languages & Seen Domains | Go, PHP, C#, C, JS | Algorithmic |
| (iii) Seen Languages & Unseen Domains | C++, Python, Java | Research, Production |
| (iv) Unseen Languages & Domains | Go, PHP, C#, C, JS | Research, Production |
Dataset Size:
- Train - 500K samples (238K Human-Written | 262K Machine-Generated)
- Validation - 100K samples
Data Format Each dataset contains the following fields:
code: The code snippetlabel: The binary label (0 for human-written, 1 for machine-generated)language: The programming language of the snippet
Label mappings are provided in task_A/label_to_id.json and task_A/id_to_label.json.
Evaluation Metric The primary evaluation metric for Subtask A is Macro F1-score. This metric ensures balanced performance across both classes.
Submission Format
Participants must submit a .csv file with the following columns:
id: Unique identifier for each code snippetlabel: Predicted label (0 or 1)
A sample submission file is available in the task_A/ folder.
Baseline Models
Baseline implementations for Subtask A are provided in the baselines/ directory. These include starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.
Restrictions
- No external training data: Use only the provided datasets.
- No specialized AI-generated code detectors: General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.
Model tree for dzungpham/SLA-SemEval-challenge
Base model
microsoft/unixcoder-base