dzungpham
/

graphcodebert-code-classification

@@ -1,3 +1,4 @@
 ---
 license: mit
 metrics:
@@ -14,61 +15,63 @@ tags:
 - transformers
 - bert
 ---
-## 🔍 Task Overview
-The rise of generative models has made it increasingly difficult to distinguish machine-generated code from human-written code — especially across different programming languages, domains, and generation techniques.
-**SemEval-2026 Task 13** challenges participants to build systems that can **detect machine-generated code** under diverse conditions by evaluating generalization to unseen languages, generator families, and code application scenarios.
-The task consists of **three subtasks**:
 ---
 ### Subtask A: Binary Machine-Generated Code Detection
 **Goal:**
-Given a code snippet, predict whether it is:
-- **(i)** Fully **human-written**, or
-- **(ii)** Fully **machine-generated**
-**Training Languages:** `C++`, `Python`, `Java`
-**Training Domain:** `Algorithmic` (e.g., Leetcode-style problems)
 **Evaluation Settings:**
-| Setting                              | Language                | Domain                 |
-|--------------------------------------|-------------------------|------------------------|
-| (i) Seen Languages & Seen Domains    | C++, Python, Java       | Algorithmic            |
-| (ii) Unseen Languages & Seen Domains | Go, PHP, C#, C, JS      | Algorithmic            |
-| (iii) Seen Languages & Unseen Domains| C++, Python, Java       | Research, Production   |
-| (iv) Unseen Languages & Domains      | Go, PHP, C#, C, JS      | Research, Production   |
-**Dataset Size**:
-- Train - 500K samples (238K Human-Written | 262K Machine-Generated)
-- Validation - 100K samples
-**Data Format**
-Each dataset contains the following fields:
-- `code`: The code snippet
-- `label`: The binary label (0 for human-written, 1 for machine-generated)
-- `language`: The programming language of the snippet
 Label mappings are provided in `task_A/label_to_id.json` and `task_A/id_to_label.json`.
-**Evaluation Metric**
-The primary evaluation metric for Subtask A is **Macro F1-score**. This metric ensures balanced performance across both classes.
-**Submission Format**
-Participants must submit a `.csv` file with the following columns:
-- `id`: Unique identifier for each code snippet
-- `label`: Predicted label (0 or 1)
-A sample submission file is available in the `task_A/` folder.
-**Baseline Models**
-Baseline implementations for Subtask A are provided in the `baselines/` directory. These include starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.
-**Restrictions**
-- **No external training data**: Use only the provided datasets.
-- **No specialized AI-generated code detectors**: General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.

+```
 ---
 license: mit
 metrics:
 - transformers
 - bert
 ---
+## Task Overview
+The rapid advancement of generative models has made it increasingly challenging to distinguish machine-generated code from human-written code, particularly across different programming languages, domains, and generation techniques.
+SemEval-2026 Task 13 focuses on developing systems capable of detecting machine-generated code under diverse conditions. The evaluation emphasizes generalization to unseen programming languages, generator families, and application scenarios.
+The task is divided into three subtasks.
 ---
 ### Subtask A: Binary Machine-Generated Code Detection
 **Goal:**
+Given a code snippet, determine whether it is:
+- Fully human-written, or
+- Fully machine-generated
+**Training Languages:** C++, Python, Java
+**Training Domain:** Algorithmic (e.g., LeetCode-style problems)
 **Evaluation Settings:**
+| Setting                              | Language                | Domain               |
+|--------------------------------------|-------------------------|----------------------|
+| (i) Seen Languages & Seen Domains    | C++, Python, Java       | Algorithmic          |
+| (ii) Unseen Languages & Seen Domains | Go, PHP, C#, C, JS      | Algorithmic          |
+| (iii) Seen Languages & Unseen Domains| C++, Python, Java       | Research, Production |
+| (iv) Unseen Languages & Domains      | Go, PHP, C#, C, JS      | Research, Production |
+**Dataset Size:**
+- Train: 500,000 samples (238,000 human-written, 262,000 machine-generated)
+- Validation: 100,000 samples
+**Data Format:**
+Each dataset includes the following fields:
+- `code`: The code snippet
+- `label`: Binary label (0 for human-written, 1 for machine-generated)
+- `language`: Programming language of the snippet
 Label mappings are provided in `task_A/label_to_id.json` and `task_A/id_to_label.json`.
+**Evaluation Metric:**
+The primary metric for Subtask A is Macro F1-score, ensuring balanced performance across both classes.
+**Submission Format:**
+Participants must submit a `.csv` file containing:
+- `id`: Unique identifier for each code snippet
+- `label`: Predicted label (0 or 1)
+A sample submission file is available in the `task_A/` directory.
+**Baseline Models:**
+Baseline implementations are provided in the `baselines/` directory, including starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder.
+**Restrictions:**
+- No external training data may be used; only the provided datasets are allowed.
+- Specialized AI-generated code detectors are not permitted. General-purpose code models (e.g., CodeBERT, StarCoder) are allowed.
+```