Instructions to use dzungpham/graphcodebert-code-classification with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dzungpham/graphcodebert-code-classification with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("dzungpham/graphcodebert-code-classification", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| license: mit | |
| datasets: | |
| - DaniilOr/SemEval-2026-Task13 | |
| metrics: | |
| - accuracy | |
| - f1 | |
| - precision | |
| - recall | |
| base_model: | |
| - microsoft/unixcoder-base | |
| library_name: transformers | |
| tags: | |
| - detection | |
| - AI-generated | |
| - transformers | |
| - bert | |
| ## 🔍 Task Overview | |
| The rise of generative models has made it increasingly difficult to distinguish machine-generated code from human-written code — especially across different programming languages, domains, and generation techniques. | |
| **SemEval-2026 Task 13** challenges participants to build systems that can **detect machine-generated code** under diverse conditions by evaluating generalization to unseen languages, generator families, and code application scenarios. | |
| The task consists of **three subtasks**: | |
| --- | |
| ### Subtask A: Binary Machine-Generated Code Detection | |
| **Goal:** | |
| Given a code snippet, predict whether it is: | |
| - **(i)** Fully **human-written**, or | |
| - **(ii)** Fully **machine-generated** | |
| **Training Languages:** `C++`, `Python`, `Java` | |
| **Training Domain:** `Algorithmic` (e.g., Leetcode-style problems) | |
| **Evaluation Settings:** | |
| | Setting | Language | Domain | | |
| |--------------------------------------|-------------------------|------------------------| | |
| | (i) Seen Languages & Seen Domains | C++, Python, Java | Algorithmic | | |
| | (ii) Unseen Languages & Seen Domains | Go, PHP, C#, C, JS | Algorithmic | | |
| | (iii) Seen Languages & Unseen Domains| C++, Python, Java | Research, Production | | |
| | (iv) Unseen Languages & Domains | Go, PHP, C#, C, JS | Research, Production | | |
| **Dataset Size**: | |
| - Train - 500K samples (238K Human-Written | 262K Machine-Generated) | |
| - Validation - 100K samples | |
| **Data Format** | |
| Each dataset contains the following fields: | |
| - `code`: The code snippet | |
| - `label`: The binary label (0 for human-written, 1 for machine-generated) | |
| - `language`: The programming language of the snippet | |
| Label mappings are provided in `task_A/label_to_id.json` and `task_A/id_to_label.json`. | |
| **Evaluation Metric** | |
| The primary evaluation metric for Subtask A is **Macro F1-score**. This metric ensures balanced performance across both classes. | |
| **Submission Format** | |
| Participants must submit a `.csv` file with the following columns: | |
| - `id`: Unique identifier for each code snippet | |
| - `label`: Predicted label (0 or 1) | |
| A sample submission file is available in the `task_A/` folder. | |
| **Baseline Models** | |
| Baseline implementations for Subtask A are provided in the `baselines/` directory. These include starter code and pre-trained checkpoints for models such as GraphCodeBERT and UniXcoder. | |
| **Restrictions** | |
| - **No external training data**: Use only the provided datasets. | |
| - **No specialized AI-generated code detectors**: General-purpose code models (e.g., CodeBERT, StarCoder) are allowed. | |