dzungpham commited on
Commit
b586b98
·
verified ·
1 Parent(s): 28b4bc7

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +76 -0
README.md ADDED
@@ -0,0 +1,76 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ datasets:
4
+ - DaniilOr/SemEval-2026-Task13
5
+ metrics:
6
+ - accuracy
7
+ - f1
8
+ - precision
9
+ - recall
10
+ base_model:
11
+ - microsoft/unixcoder-base
12
+ library_name: transformers
13
+ tags:
14
+ - detection
15
+ - AI-generated
16
+ - transformers
17
+ - bert
18
+ ---
19
+ ## 🔍 Task Overview
20
+
21
+ The rise of generative models has made it increasingly difficult to distinguish machine-generated code from human-written code — especially across different programming languages, domains, and generation techniques.
22
+
23
+ **SemEval-2026 Task 13** challenges participants to build systems that can **detect machine-generated code** under diverse conditions by evaluating generalization to unseen languages, generator families, and code application scenarios.
24
+
25
+ The task consists of **three subtasks**:
26
+
27
+ ---
28
+
29
+ ### Subtask A: Binary Machine-Generated Code Detection
30
+
31
+ **Goal:**
32
+ Given a code snippet, predict whether it is:
33
+
34
+ - **(i)** Fully **human-written**, or
35
+ - **(ii)** Fully **machine-generated**
36
+
37
+ **Training Languages:** `C++`, `Python`, `Java`
38
+ **Training Domain:** `Algorithmic` (e.g., Leetcode-style problems)
39
+
40
+ **Evaluation Settings:**
41
+
42
+ | Setting | Language | Domain |
43
+ |--------------------------------------|-------------------------|------------------------|
44
+ | (i) Seen Languages & Seen Domains | C++, Python, Java | Algorithmic |
45
+ | (ii) Unseen Languages & Seen Domains | Go, PHP, C#, C, JS | Algorithmic |
46
+ | (iii) Seen Languages & Unseen Domains| C++, Python, Java | Research, Production |
47
+ | (iv) Unseen Languages & Domains | Go, PHP, C#, C, JS | Research, Production |
48
+
49
+ **Dataset Size**:
50
+ - Train - 500K samples (238K Human-Written | 262K Machine-Generated)
51
+ - Validation - 100K samples
52
+
53
+ **Data Format**
54
+ Each dataset contains the following fields:
55
+ - `code`: The code snippet
56
+ - `label`: The binary label (0 for human-written, 1 for machine-generated)
57
+ - `language`: The programming language of the snippet
58
+
59
+ Label mappings are provided in `task_A/label_to_id.json` and `task_A/id_to_label.json`.
60
+
61
+ **Evaluation Metric**
62
+ The primary evaluation metric for Subtask A is **Macro F1-score**. This metric ensures balanced performance across both classes.
63
+
64
+ **Submission Format**
65
+ Participants must submit a `.csv` file with the following columns:
66
+ - `id`: Unique identifier for each code snippet
67
+ - `label`: Predicted label (0 or 1)
68
+
69
+ A sample submission file is available in the `task_A/` folder.
70
+
71
+ **Baseline Models**
72
+ Baseline implementations for Subtask A are provided in the `baselines/` directory. These include starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.
73
+
74
+ **Restrictions**
75
+ - **No external training data**: Use only the provided datasets.
76
+ - **No specialized AI-generated code detectors**: General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.