Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,124 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Experiment 1 β Latin Square 2: CCT5 & COME on MCMD-NT
|
| 2 |
+
|
| 3 |
+
This repository contains the artifacts for **Latin Square 2 of Experiment 1**, which corresponds to the **reproduction of the original experiment** by Wu et al. (2025) on the **MCMD-NT dataset** using the DNN-based commit message generation baselines **CCT5** and **COME**.
|
| 4 |
+
|
| 5 |
+
***
|
| 6 |
+
|
| 7 |
+
## Models
|
| 8 |
+
|
| 9 |
+
### CCT5
|
| 10 |
+
CCT5 is a code-change-oriented pre-trained model built on top of the **T5 architecture**, initialized from **CodeT5** weights. It is further specialized through pre-training on **CodeChangeNet**, a commit-diff dataset containing roughly 40GB of diff and commit message pairs (~1.5M pairs). It was released at ESEC/FSE 2023.
|
| 11 |
+
|
| 12 |
+
- Base: `T5-base` β `CodeT5` β `CCT5`
|
| 13 |
+
- Pre-training data: CodeChangeNet (40GB, 1.5M diff/commit pairs)
|
| 14 |
+
- For MCMD-NT: reused released MCMD-trained checkpoint from original authors (same checkpoint as MCMD, since MCMD-NT shares the same languages and structure)
|
| 15 |
+
|
| 16 |
+
### COME
|
| 17 |
+
COME (Commit Message Generation with Modification Embedding) is a hybrid DNN approach that combines:
|
| 18 |
+
- A **fine-tuned CodeT5** component for natural language generation
|
| 19 |
+
- **Modification embedding** to represent code changes as numerical vectors
|
| 20 |
+
- An **SVM-based decision algorithm** to select between generated and retrieved candidate messages
|
| 21 |
+
|
| 22 |
+
It does not perform additional large-scale pre-training on top of CodeT5. Released at ISSTA 2023.
|
| 23 |
+
|
| 24 |
+
- For MCMD-NT: reused language-specific MCMD-trained checkpoints released by original COME authors (one per language)
|
| 25 |
+
|
| 26 |
+
***
|
| 27 |
+
|
| 28 |
+
## Dataset
|
| 29 |
+
|
| 30 |
+
**MCMD-NT** β Part of MCMD-New; newer commits from repositories also present in the original MCMD dataset.
|
| 31 |
+
|
| 32 |
+
| Property | Details |
|
| 33 |
+
|----------|---------|
|
| 34 |
+
| Languages | Java, C++, C#, Python, JavaScript |
|
| 35 |
+
| Repositories | 367 repositories shared with the MCMD dataset |
|
| 36 |
+
| Total commits | 229,492 |
|
| 37 |
+
| Date range | January 1st, 2022 onwards (newer than MCMD) |
|
| 38 |
+
| Split | 80% train / 10% validation / 10% test |
|
| 39 |
+
| Authors | Wu et al. (2025) |
|
| 40 |
+
|
| 41 |
+
MCMD-NT was constructed to reduce the risk of **data leakage**, using newer commits from the same repositories as MCMD to test model generalization to more recent data without introducing new programming languages.
|
| 42 |
+
|
| 43 |
+
***
|
| 44 |
+
|
| 45 |
+
## Repository Structure
|
| 46 |
+
|
| 47 |
+
Each run folder corresponds to a **programming language** evaluated in this Latin Square:
|
| 48 |
+
|
| 49 |
+
```
|
| 50 |
+
experiment1_ls2/
|
| 51 |
+
βββ run_java/
|
| 52 |
+
β βββ checkpoint/ # CCT5 and COME checkpoints (reused MCMD-trained, Java)
|
| 53 |
+
β βββ predictions/ # Generated commit messages on MCMD-NT Java test set
|
| 54 |
+
β βββ metrics/ # BLEU, METEOR, ROUGE-L, CIDEr scores
|
| 55 |
+
βββ run_cpp/
|
| 56 |
+
β βββ checkpoint/
|
| 57 |
+
β βββ predictions/
|
| 58 |
+
β βββ metrics/
|
| 59 |
+
βββ run_csharp/
|
| 60 |
+
β βββ checkpoint/
|
| 61 |
+
β βββ predictions/
|
| 62 |
+
β βββ metrics/
|
| 63 |
+
βββ run_python/
|
| 64 |
+
β βββ checkpoint/
|
| 65 |
+
β βββ predictions/
|
| 66 |
+
β βββ metrics/
|
| 67 |
+
βββ run_javascript/
|
| 68 |
+
βββ checkpoint/
|
| 69 |
+
βββ predictions/
|
| 70 |
+
βββ metrics/
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
### `checkpoint/`
|
| 74 |
+
Contains the model checkpoint files for CCT5 and COME. These are the **same checkpoints used for MCMD** (LS1), reused here since MCMD-NT shares the same languages and format as MCMD.
|
| 75 |
+
|
| 76 |
+
### `predictions/`
|
| 77 |
+
Contains the generated commit messages produced by each model on the MCMD-NT test set for the corresponding language, stored as `.txt` files with one prediction per line aligned to the reference messages.
|
| 78 |
+
|
| 79 |
+
### `metrics/`
|
| 80 |
+
Contains the computed evaluation metric scores for each model-language combination. Metrics are calculated by comparing predictions against the reference messages in the MCMD-NT test set.
|
| 81 |
+
|
| 82 |
+
***
|
| 83 |
+
|
| 84 |
+
## Evaluation Metrics
|
| 85 |
+
|
| 86 |
+
| Metric | Description |
|
| 87 |
+
|--------|-------------|
|
| 88 |
+
| **BLEU** | Bilingual Evaluation Understudy β measures n-gram precision between generated and reference messages |
|
| 89 |
+
| **METEOR** | Metric for Evaluation of Translation with Explicit Ordering β extends BLEU with recall, stemming, and synonym matching |
|
| 90 |
+
| **ROUGE-L** | Recall-Oriented Understudy for Gisting Evaluation (LCS variant) β measures longest common subsequence overlap |
|
| 91 |
+
| **CIDEr** | Consensus-based Image Description Evaluation β TF-IDF-weighted n-gram similarity against reference messages |
|
| 92 |
+
|
| 93 |
+
### Reported Results (Original Paper β Wu et al., 2025)
|
| 94 |
+
|
| 95 |
+
| Language | Model | BLEU | METEOR | ROUGE-L | CIDEr |
|
| 96 |
+
|----------|-------|------|--------|---------|-------|
|
| 97 |
+
| Java | CCT5 | 22.15 | 19.05 | 30.18 | 1.48 |
|
| 98 |
+
| Java | COME | 31.46 | 26.41 | 39.53 | 2.41 |
|
| 99 |
+
| C++ | CCT5 | 16.94 | 13.15 | 23.52 | 0.86 |
|
| 100 |
+
| C++ | COME | 25.60 | 20.47 | 31.68 | 1.74 |
|
| 101 |
+
| C# | CCT5 | 15.26 | 13.22 | 21.27 | 0.79 |
|
| 102 |
+
| C# | COME | 28.83 | 25.02 | 34.90 | 1.95 |
|
| 103 |
+
| Python | CCT5 | 19.02 | 16.12 | 30.47 | 0.98 |
|
| 104 |
+
| Python | COME | 25.95 | 22.55 | 36.78 | 1.75 |
|
| 105 |
+
| JavaScript | CCT5 | 24.72 | 21.66 | 34.42 | 1.73 |
|
| 106 |
+
| JavaScript | COME | 31.30 | 27.06 | 39.77 | 2.41 |
|
| 107 |
+
| **Average** | **CCT5** | **19.62** | **16.64** | **27.97** | **1.17** |
|
| 108 |
+
| **Average** | **COME** | **28.63** | **24.30** | **36.53** | **2.05** |
|
| 109 |
+
|
| 110 |
+
***
|
| 111 |
+
|
| 112 |
+
## Notes
|
| 113 |
+
|
| 114 |
+
- Checkpoints were **reused** from the original authors' repositories; no retraining was performed for MCMD-NT.
|
| 115 |
+
- No random seeds or repeated runs were documented in the original experiment.
|
| 116 |
+
- Results in this repository correspond to the **reproduction attempt** of the original reported values.
|
| 117 |
+
- MCMD-NT shares the same five languages as MCMD but contains newer commits, enabling a temporal generalization assessment.
|
| 118 |
+
- Discrepancies between reproduced and reported results are documented in the thesis.
|
| 119 |
+
|
| 120 |
+
***
|
| 121 |
+
|
| 122 |
+
## References
|
| 123 |
+
|
| 124 |
+
- Wu et al. (2025). *An Empirical Study on Commit Message Generation with Large Language Models via In-Context Learning.* arXiv:2502.18904.
|