πŸ” Task Overview

The rise of generative models has made it increasingly difficult to distinguish machine-generated code from human-written code β€” especially across different programming languages, domains, and generation techniques.

SemEval-2026 Task 13 challenges participants to build systems that can detect machine-generated code under diverse conditions by evaluating generalization to unseen languages, generator families, and code application scenarios.

The task consists of three subtasks:


Subtask A: Binary Machine-Generated Code Detection

Goal:
Given a code snippet, predict whether it is:

  • (i) Fully human-written, or
  • (ii) Fully machine-generated

Training Languages: C++, Python, Java
Training Domain: Algorithmic (e.g., Leetcode-style problems)

Evaluation Settings:

Setting Language Domain
(i) Seen Languages & Seen Domains C++, Python, Java Algorithmic
(ii) Unseen Languages & Seen Domains Go, PHP, C#, C, JS Algorithmic
(iii) Seen Languages & Unseen Domains C++, Python, Java Research, Production
(iv) Unseen Languages & Domains Go, PHP, C#, C, JS Research, Production

Dataset Size:

  • Train - 500K samples (238K Human-Written | 262K Machine-Generated)
  • Validation - 100K samples

Data Format Each dataset contains the following fields:

  • code: The code snippet
  • label: The binary label (0 for human-written, 1 for machine-generated)
  • language: The programming language of the snippet

Label mappings are provided in task_A/label_to_id.json and task_A/id_to_label.json.

Evaluation Metric The primary evaluation metric for Subtask A is Macro F1-score. This metric ensures balanced performance across both classes.

Submission Format Participants must submit a .csv file with the following columns:

  • id: Unique identifier for each code snippet
  • label: Predicted label (0 or 1)

A sample submission file is available in the task_A/ folder.

Baseline Models Baseline implementations for Subtask A are provided in the baselines/ directory. These include starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.

Restrictions

  • No external training data: Use only the provided datasets.
  • No specialized AI-generated code detectors: General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for dzungpham/SLA-SemEval-challenge

Finetuned
(13)
this model