dzungpham commited on
Commit
d2240b9
·
verified ·
1 Parent(s): b00f41c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +38 -35
README.md CHANGED
@@ -1,3 +1,4 @@
 
1
  ---
2
  license: mit
3
  metrics:
@@ -14,61 +15,63 @@ tags:
14
  - transformers
15
  - bert
16
  ---
17
- ## 🔍 Task Overview
18
 
19
- The rise of generative models has made it increasingly difficult to distinguish machine-generated code from human-written code — especially across different programming languages, domains, and generation techniques.
20
 
21
- **SemEval-2026 Task 13** challenges participants to build systems that can **detect machine-generated code** under diverse conditions by evaluating generalization to unseen languages, generator families, and code application scenarios.
22
 
23
- The task consists of **three subtasks**:
 
 
24
 
25
  ---
26
 
27
  ### Subtask A: Binary Machine-Generated Code Detection
28
 
29
  **Goal:**
30
- Given a code snippet, predict whether it is:
31
 
32
- - **(i)** Fully **human-written**, or
33
- - **(ii)** Fully **machine-generated**
34
 
35
- **Training Languages:** `C++`, `Python`, `Java`
36
- **Training Domain:** `Algorithmic` (e.g., Leetcode-style problems)
37
 
38
  **Evaluation Settings:**
39
 
40
- | Setting | Language | Domain |
41
- |--------------------------------------|-------------------------|------------------------|
42
- | (i) Seen Languages & Seen Domains | C++, Python, Java | Algorithmic |
43
- | (ii) Unseen Languages & Seen Domains | Go, PHP, C#, C, JS | Algorithmic |
44
- | (iii) Seen Languages & Unseen Domains| C++, Python, Java | Research, Production |
45
- | (iv) Unseen Languages & Domains | Go, PHP, C#, C, JS | Research, Production |
46
 
47
- **Dataset Size**:
48
- - Train - 500K samples (238K Human-Written | 262K Machine-Generated)
49
- - Validation - 100K samples
50
 
51
- **Data Format**
52
- Each dataset contains the following fields:
53
- - `code`: The code snippet
54
- - `label`: The binary label (0 for human-written, 1 for machine-generated)
55
- - `language`: The programming language of the snippet
56
 
57
  Label mappings are provided in `task_A/label_to_id.json` and `task_A/id_to_label.json`.
58
 
59
- **Evaluation Metric**
60
- The primary evaluation metric for Subtask A is **Macro F1-score**. This metric ensures balanced performance across both classes.
61
 
62
- **Submission Format**
63
- Participants must submit a `.csv` file with the following columns:
64
- - `id`: Unique identifier for each code snippet
65
- - `label`: Predicted label (0 or 1)
66
 
67
- A sample submission file is available in the `task_A/` folder.
68
 
69
- **Baseline Models**
70
- Baseline implementations for Subtask A are provided in the `baselines/` directory. These include starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.
71
 
72
- **Restrictions**
73
- - **No external training data**: Use only the provided datasets.
74
- - **No specialized AI-generated code detectors**: General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.
 
 
1
+ ```
2
  ---
3
  license: mit
4
  metrics:
 
15
  - transformers
16
  - bert
17
  ---
 
18
 
19
+ ## Task Overview
20
 
21
+ The rapid advancement of generative models has made it increasingly challenging to distinguish machine-generated code from human-written code, particularly across different programming languages, domains, and generation techniques.
22
 
23
+ SemEval-2026 Task 13 focuses on developing systems capable of detecting machine-generated code under diverse conditions. The evaluation emphasizes generalization to unseen programming languages, generator families, and application scenarios.
24
+
25
+ The task is divided into three subtasks.
26
 
27
  ---
28
 
29
  ### Subtask A: Binary Machine-Generated Code Detection
30
 
31
  **Goal:**
32
+ Given a code snippet, determine whether it is:
33
 
34
+ - Fully human-written, or
35
+ - Fully machine-generated
36
 
37
+ **Training Languages:** C++, Python, Java
38
+ **Training Domain:** Algorithmic (e.g., LeetCode-style problems)
39
 
40
  **Evaluation Settings:**
41
 
42
+ | Setting | Language | Domain |
43
+ |--------------------------------------|-------------------------|----------------------|
44
+ | (i) Seen Languages & Seen Domains | C++, Python, Java | Algorithmic |
45
+ | (ii) Unseen Languages & Seen Domains | Go, PHP, C#, C, JS | Algorithmic |
46
+ | (iii) Seen Languages & Unseen Domains| C++, Python, Java | Research, Production |
47
+ | (iv) Unseen Languages & Domains | Go, PHP, C#, C, JS | Research, Production |
48
 
49
+ **Dataset Size:**
50
+ - Train: 500,000 samples (238,000 human-written, 262,000 machine-generated)
51
+ - Validation: 100,000 samples
52
 
53
+ **Data Format:**
54
+ Each dataset includes the following fields:
55
+ - `code`: The code snippet
56
+ - `label`: Binary label (0 for human-written, 1 for machine-generated)
57
+ - `language`: Programming language of the snippet
58
 
59
  Label mappings are provided in `task_A/label_to_id.json` and `task_A/id_to_label.json`.
60
 
61
+ **Evaluation Metric:**
62
+ The primary metric for Subtask A is Macro F1-score, ensuring balanced performance across both classes.
63
 
64
+ **Submission Format:**
65
+ Participants must submit a `.csv` file containing:
66
+ - `id`: Unique identifier for each code snippet
67
+ - `label`: Predicted label (0 or 1)
68
 
69
+ A sample submission file is available in the `task_A/` directory.
70
 
71
+ **Baseline Models:**
72
+ Baseline implementations are provided in the `baselines/` directory, including starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.
73
 
74
+ **Restrictions:**
75
+ - No external training data may be used; only the provided datasets are allowed.
76
+ - Specialized AI-generated code detectors are not permitted. General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.
77
+ ```