--- license: mit datasets: - DaniilOr/SemEval-2026-Task13 metrics: - accuracy - f1 - precision - recall base_model: - microsoft/unixcoder-base library_name: transformers tags: - detection - AI-generated - transformers - bert --- ## 🔍 Task Overview The rise of generative models has made it increasingly difficult to distinguish machine-generated code from human-written code — especially across different programming languages, domains, and generation techniques. **SemEval-2026 Task 13** challenges participants to build systems that can **detect machine-generated code** under diverse conditions by evaluating generalization to unseen languages, generator families, and code application scenarios. The task consists of **three subtasks**: --- ### Subtask A: Binary Machine-Generated Code Detection **Goal:** Given a code snippet, predict whether it is: - **(i)** Fully **human-written**, or - **(ii)** Fully **machine-generated** **Training Languages:** `C++`, `Python`, `Java` **Training Domain:** `Algorithmic` (e.g., Leetcode-style problems) **Evaluation Settings:** | Setting | Language | Domain | |--------------------------------------|-------------------------|------------------------| | (i) Seen Languages & Seen Domains | C++, Python, Java | Algorithmic | | (ii) Unseen Languages & Seen Domains | Go, PHP, C#, C, JS | Algorithmic | | (iii) Seen Languages & Unseen Domains| C++, Python, Java | Research, Production | | (iv) Unseen Languages & Domains | Go, PHP, C#, C, JS | Research, Production | **Dataset Size**: - Train - 500K samples (238K Human-Written | 262K Machine-Generated) - Validation - 100K samples **Data Format** Each dataset contains the following fields: - `code`: The code snippet - `label`: The binary label (0 for human-written, 1 for machine-generated) - `language`: The programming language of the snippet Label mappings are provided in `task_A/label_to_id.json` and `task_A/id_to_label.json`. **Evaluation Metric** The primary evaluation metric for Subtask A is **Macro F1-score**. This metric ensures balanced performance across both classes. **Submission Format** Participants must submit a `.csv` file with the following columns: - `id`: Unique identifier for each code snippet - `label`: Predicted label (0 or 1) A sample submission file is available in the `task_A/` folder. **Baseline Models** Baseline implementations for Subtask A are provided in the `baselines/` directory. These include starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder. **Restrictions** - **No external training data**: Use only the provided datasets. - **No specialized AI-generated code detectors**: General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.