🔍 Task Overview

The rise of generative models has made it increasingly difficult to distinguish machine-generated code from human-written code — especially across different programming languages, domains, and generation techniques.

SemEval-2026 Task 13 challenges participants to build systems that can detect machine-generated code under diverse conditions by evaluating generalization to unseen languages, generator families, and code application scenarios.

The task consists of three subtasks:

Subtask A: Binary Machine-Generated Code Detection

Goal:
Given a code snippet, predict whether it is:

(i) Fully human-written, or
(ii) Fully machine-generated

Training Languages: C++, Python, Java
Training Domain: Algorithmic (e.g., Leetcode-style problems)

Evaluation Settings:

Setting	Language	Domain
(i) Seen Languages & Seen Domains	C++, Python, Java	Algorithmic
(ii) Unseen Languages & Seen Domains	Go, PHP, C#, C, JS	Algorithmic
(iii) Seen Languages & Unseen Domains	C++, Python, Java	Research, Production
(iv) Unseen Languages & Domains	Go, PHP, C#, C, JS	Research, Production

Dataset Size:

Train - 500K samples (238K Human-Written | 262K Machine-Generated)
Validation - 100K samples

Data Format Each dataset contains the following fields:

code: The code snippet
label: The binary label (0 for human-written, 1 for machine-generated)
language: The programming language of the snippet

Label mappings are provided in task_A/label_to_id.json and task_A/id_to_label.json.

Evaluation Metric The primary evaluation metric for Subtask A is Macro F1-score. This metric ensures balanced performance across both classes.

Submission Format Participants must submit a .csv file with the following columns:

id: Unique identifier for each code snippet
label: Predicted label (0 or 1)

A sample submission file is available in the task_A/ folder.

Baseline Models Baseline implementations for Subtask A are provided in the baselines/ directory. These include starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.

Restrictions

No external training data: Use only the provided datasets.
No specialized AI-generated code detectors: General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dzungpham/SLA-SemEval-challenge

Base model

microsoft/unixcoder-base

Finetuned

(13)

this model