clouds125 commited on
Commit
9bbe903
Β·
verified Β·
1 Parent(s): cd3528a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +124 -0
README.md ADDED
@@ -0,0 +1,124 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Experiment 1 – Latin Square 2: CCT5 & COME on MCMD-NT
2
+
3
+ This repository contains the artifacts for **Latin Square 2 of Experiment 1**, which corresponds to the **reproduction of the original experiment** by Wu et al. (2025) on the **MCMD-NT dataset** using the DNN-based commit message generation baselines **CCT5** and **COME**.
4
+
5
+ ***
6
+
7
+ ## Models
8
+
9
+ ### CCT5
10
+ CCT5 is a code-change-oriented pre-trained model built on top of the **T5 architecture**, initialized from **CodeT5** weights. It is further specialized through pre-training on **CodeChangeNet**, a commit-diff dataset containing roughly 40GB of diff and commit message pairs (~1.5M pairs). It was released at ESEC/FSE 2023.
11
+
12
+ - Base: `T5-base` β†’ `CodeT5` β†’ `CCT5`
13
+ - Pre-training data: CodeChangeNet (40GB, 1.5M diff/commit pairs)
14
+ - For MCMD-NT: reused released MCMD-trained checkpoint from original authors (same checkpoint as MCMD, since MCMD-NT shares the same languages and structure)
15
+
16
+ ### COME
17
+ COME (Commit Message Generation with Modification Embedding) is a hybrid DNN approach that combines:
18
+ - A **fine-tuned CodeT5** component for natural language generation
19
+ - **Modification embedding** to represent code changes as numerical vectors
20
+ - An **SVM-based decision algorithm** to select between generated and retrieved candidate messages
21
+
22
+ It does not perform additional large-scale pre-training on top of CodeT5. Released at ISSTA 2023.
23
+
24
+ - For MCMD-NT: reused language-specific MCMD-trained checkpoints released by original COME authors (one per language)
25
+
26
+ ***
27
+
28
+ ## Dataset
29
+
30
+ **MCMD-NT** – Part of MCMD-New; newer commits from repositories also present in the original MCMD dataset.
31
+
32
+ | Property | Details |
33
+ |----------|---------|
34
+ | Languages | Java, C++, C#, Python, JavaScript |
35
+ | Repositories | 367 repositories shared with the MCMD dataset |
36
+ | Total commits | 229,492 |
37
+ | Date range | January 1st, 2022 onwards (newer than MCMD) |
38
+ | Split | 80% train / 10% validation / 10% test |
39
+ | Authors | Wu et al. (2025) |
40
+
41
+ MCMD-NT was constructed to reduce the risk of **data leakage**, using newer commits from the same repositories as MCMD to test model generalization to more recent data without introducing new programming languages.
42
+
43
+ ***
44
+
45
+ ## Repository Structure
46
+
47
+ Each run folder corresponds to a **programming language** evaluated in this Latin Square:
48
+
49
+ ```
50
+ experiment1_ls2/
51
+ β”œβ”€β”€ run_java/
52
+ β”‚ β”œβ”€β”€ checkpoint/ # CCT5 and COME checkpoints (reused MCMD-trained, Java)
53
+ β”‚ β”œβ”€β”€ predictions/ # Generated commit messages on MCMD-NT Java test set
54
+ β”‚ └── metrics/ # BLEU, METEOR, ROUGE-L, CIDEr scores
55
+ β”œβ”€β”€ run_cpp/
56
+ β”‚ β”œβ”€β”€ checkpoint/
57
+ β”‚ β”œβ”€β”€ predictions/
58
+ β”‚ └── metrics/
59
+ β”œβ”€β”€ run_csharp/
60
+ β”‚ β”œβ”€β”€ checkpoint/
61
+ β”‚ β”œβ”€β”€ predictions/
62
+ β”‚ └── metrics/
63
+ β”œβ”€β”€ run_python/
64
+ β”‚ β”œβ”€β”€ checkpoint/
65
+ β”‚ β”œβ”€β”€ predictions/
66
+ β”‚ └── metrics/
67
+ └── run_javascript/
68
+ β”œβ”€β”€ checkpoint/
69
+ β”œβ”€β”€ predictions/
70
+ └── metrics/
71
+ ```
72
+
73
+ ### `checkpoint/`
74
+ Contains the model checkpoint files for CCT5 and COME. These are the **same checkpoints used for MCMD** (LS1), reused here since MCMD-NT shares the same languages and format as MCMD.
75
+
76
+ ### `predictions/`
77
+ Contains the generated commit messages produced by each model on the MCMD-NT test set for the corresponding language, stored as `.txt` files with one prediction per line aligned to the reference messages.
78
+
79
+ ### `metrics/`
80
+ Contains the computed evaluation metric scores for each model-language combination. Metrics are calculated by comparing predictions against the reference messages in the MCMD-NT test set.
81
+
82
+ ***
83
+
84
+ ## Evaluation Metrics
85
+
86
+ | Metric | Description |
87
+ |--------|-------------|
88
+ | **BLEU** | Bilingual Evaluation Understudy β€” measures n-gram precision between generated and reference messages |
89
+ | **METEOR** | Metric for Evaluation of Translation with Explicit Ordering β€” extends BLEU with recall, stemming, and synonym matching |
90
+ | **ROUGE-L** | Recall-Oriented Understudy for Gisting Evaluation (LCS variant) β€” measures longest common subsequence overlap |
91
+ | **CIDEr** | Consensus-based Image Description Evaluation β€” TF-IDF-weighted n-gram similarity against reference messages |
92
+
93
+ ### Reported Results (Original Paper – Wu et al., 2025)
94
+
95
+ | Language | Model | BLEU | METEOR | ROUGE-L | CIDEr |
96
+ |----------|-------|------|--------|---------|-------|
97
+ | Java | CCT5 | 22.15 | 19.05 | 30.18 | 1.48 |
98
+ | Java | COME | 31.46 | 26.41 | 39.53 | 2.41 |
99
+ | C++ | CCT5 | 16.94 | 13.15 | 23.52 | 0.86 |
100
+ | C++ | COME | 25.60 | 20.47 | 31.68 | 1.74 |
101
+ | C# | CCT5 | 15.26 | 13.22 | 21.27 | 0.79 |
102
+ | C# | COME | 28.83 | 25.02 | 34.90 | 1.95 |
103
+ | Python | CCT5 | 19.02 | 16.12 | 30.47 | 0.98 |
104
+ | Python | COME | 25.95 | 22.55 | 36.78 | 1.75 |
105
+ | JavaScript | CCT5 | 24.72 | 21.66 | 34.42 | 1.73 |
106
+ | JavaScript | COME | 31.30 | 27.06 | 39.77 | 2.41 |
107
+ | **Average** | **CCT5** | **19.62** | **16.64** | **27.97** | **1.17** |
108
+ | **Average** | **COME** | **28.63** | **24.30** | **36.53** | **2.05** |
109
+
110
+ ***
111
+
112
+ ## Notes
113
+
114
+ - Checkpoints were **reused** from the original authors' repositories; no retraining was performed for MCMD-NT.
115
+ - No random seeds or repeated runs were documented in the original experiment.
116
+ - Results in this repository correspond to the **reproduction attempt** of the original reported values.
117
+ - MCMD-NT shares the same five languages as MCMD but contains newer commits, enabling a temporal generalization assessment.
118
+ - Discrepancies between reproduced and reported results are documented in the thesis.
119
+
120
+ ***
121
+
122
+ ## References
123
+
124
+ - Wu et al. (2025). *An Empirical Study on Commit Message Generation with Large Language Models via In-Context Learning.* arXiv:2502.18904.