clouds125 commited on
Commit
9e5a4a7
Β·
verified Β·
1 Parent(s): 61b50c6

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +141 -0
README.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Experiment 2 – Latin Square 2: CCT5 & COME on MCMD-NL (Redesigned)
2
+
3
+ This repository contains the artifacts for **Latin Square 2 of Experiment 2**, which corresponds to the **redesigned and reimplemented experiment** evaluated on the **MCMD-NL dataset** using the DNN-based commit message generation baselines **CCT5** and **COME**.
4
+ The models have been retrained for each language on the MCMD-NL dataset and then evaluated utilizing the BLEU, METEOR, ROUGE-L, and CIDEr metrics.
5
+
6
+ ***
7
+
8
+ ## Models
9
+
10
+ ### CCT5
11
+ CCT5 is a code-change-oriented pre-trained model built on top of the **T5 architecture**, initialized from **CodeT5** weights and further pre-trained on **CodeChangeNet** (~40GB, 1.5M diff/commit pairs). Released at ESEC/FSE 2023.
12
+
13
+ - Architecture: Encoder-decoder Transformer (`T5-base` β†’ `CodeT5` β†’ `CCT5`)
14
+ - Pre-training corpus: CodeChangeNet (code diffs paired with commit messages)
15
+ - For MCMD-NL: **new checkpoint trained by fine-tuning the pre-trained CCT5 model on the MCMD-NL training set**, then evaluated on the MCMD-NL test set
16
+
17
+ ### COME
18
+ COME (Commit Message Generation with Modification Embedding) is a hybrid DNN system built on top of CodeT5 with three core components:
19
+ - **Modification embedding**: converts code changes into numerical vectors capturing code evolution
20
+ - **Fine-tuned CodeT5**: generates candidate commit messages from the embedded representation
21
+ - **SVM-based decision algorithm**: selects between the generated and retrieved candidate messages
22
+
23
+ Released at ISSTA 2023. Does not include additional large-scale pre-training beyond CodeT5.
24
+
25
+ - For MCMD-NL: **new checkpoint trained by fine-tuning the pre-trained COME model on the MCMD-NL training set**, then evaluated on the MCMD-NL test set
26
+
27
+ ***
28
+
29
+ ## Dataset
30
+
31
+ **MCMD-NL** – Part of MCMD-New; commits from repositories with programming languages **not present** in the original MCMD dataset.
32
+
33
+ | Property | Details |
34
+ |----------|---------|
35
+ | Languages | PHP, R, TypeScript, Swift, Objective-C |
36
+ | Repositories | 329 new repositories (not in MCMD) |
37
+ | Total commits | 135,699 |
38
+ | Date range | January 1st, 2022 onwards |
39
+ | Split | 80% train / 10% validation / 10% test |
40
+ | Authors | Wu et al. (2025) |
41
+
42
+ MCMD-NL was constructed to test model generalization to **entirely new programming languages**, requiring full fine-tuning from the pre-trained model checkpoints rather than reuse of existing MCMD-trained weights.
43
+
44
+ ***
45
+
46
+ ## Repository Structure
47
+
48
+ Each run folder corresponds to a **programming language** evaluated in this Latin Square. Both CCT5 and COME were fine-tuned on MCMD-NL and evaluated independently for each language.
49
+
50
+ ```
51
+ experiment2_ls2/
52
+ β”œβ”€β”€ run_php/
53
+ β”‚ β”œβ”€β”€ checkpoint/ # CCT5 and COME checkpoints fine-tuned on MCMD-NL (PHP)
54
+ β”‚ β”œβ”€β”€ predictions/ # Generated commit messages on MCMD-NL PHP test set
55
+ β”‚ └── metrics/ # BLEU, METEOR, ROUGE-L, CIDEr scores
56
+ β”œβ”€β”€ run_r/
57
+ β”‚ β”œβ”€β”€ checkpoint/
58
+ β”‚ β”œβ”€β”€ predictions/
59
+ β”‚ └── metrics/
60
+ β”œβ”€β”€ run_typescript/
61
+ β”‚ β”œβ”€β”€ checkpoint/
62
+ β”‚ β”œβ”€β”€ predictions/
63
+ β”‚ └── metrics/
64
+ β”œβ”€β”€ run_swift/
65
+ β”‚ β”œβ”€β”€ checkpoint/
66
+ β”‚ β”œβ”€β”€ predictions/
67
+ β”‚ └── metrics/
68
+ └── run_objectivec/
69
+ β”œβ”€β”€ checkpoint/
70
+ β”œβ”€β”€ predictions/
71
+ └── metrics/
72
+ ```
73
+
74
+ ### `checkpoint/`
75
+ Contains the model checkpoint files produced after fine-tuning CCT5 and COME on the MCMD-NL training set for the corresponding language. These are **newly trained checkpoints**, not reused from prior work. The best checkpoint selected during validation is stored here.
76
+
77
+ ### `predictions/`
78
+ Contains the generated commit messages produced by each model on the MCMD-NL test set for the corresponding language. Files are stored as `.txt` with one prediction per line, aligned to the reference messages in the test set.
79
+
80
+ ### `metrics/`
81
+ Contains the evaluation metric scores computed by comparing the predictions against the MCMD-NL test set reference messages. Each file records BLEU, METEOR, ROUGE-L, and CIDEr scores per model and language under the redesigned evaluation protocol.
82
+
83
+ ***
84
+
85
+ ## Evaluation Metrics
86
+
87
+ | Metric | Description |
88
+ |--------|-------------|
89
+ | **BLEU** | Bilingual Evaluation Understudy β€” measures n-gram precision between generated and reference messages |
90
+ | **METEOR** | Metric for Evaluation of Translation with Explicit Ordering β€” extends BLEU with recall, stemming, and synonym matching via WordNet |
91
+ | **ROUGE-L** | Recall-Oriented Understudy for Gisting Evaluation (LCS variant) β€” measures longest common subsequence overlap |
92
+ | **CIDEr** | Consensus-based Image Description Evaluation β€” TF-IDF-weighted n-gram similarity against reference messages |
93
+
94
+ ### Reported Results (Original Paper – Wu et al., 2025)
95
+
96
+ | Language | Model | BLEU | METEOR | ROUGE-L | CIDEr |
97
+ |----------|-------|------|--------|---------|-------|
98
+ | PHP | CCT5 | 31.96 | 27.31 | 37.99 | 2.26 |
99
+ | PHP | COME | 34.68 | 30.51 | 40.27 | 2.59 |
100
+ | R | CCT5 | 33.02 | 28.92 | 37.17 | 2.19 |
101
+ | R | COME | 35.56 | 31.99 | 38.06 | 2.66 |
102
+ | TypeScript | CCT5 | 32.33 | 27.92 | 43.62 | 2.24 |
103
+ | TypeScript | COME | 35.72 | 30.97 | 47.38 | 2.61 |
104
+ | Swift | CCT5 | 29.29 | 24.58 | 37.09 | 1.98 |
105
+ | Swift | COME | 31.72 | 27.54 | 39.32 | 2.36 |
106
+ | Objective-C | CCT5 | 28.57 | 24.62 | 31.63 | 1.68 |
107
+ | Objective-C | COME | 33.43 | 29.44 | 38.32 | 2.17 |
108
+ | **Average** | **CCT5** | **31.02** | **26.67** | **37.50** | **2.06** |
109
+ | **Average** | **COME** | **34.22** | **30.09** | **40.67** | **2.47** |
110
+
111
+ These values serve as the reference for comparison with the results produced under the redesigned protocol.
112
+
113
+ ***
114
+
115
+ ## Methodological Differences from Experiment 1
116
+
117
+ This experiment was redesigned to address the validity and reproducibility concerns identified during the Experiment 1 reproduction phase:
118
+
119
+ - **Explicit random seed documentation** for all fine-tuning runs
120
+ - **Fully documented fine-tuning procedure**: hyperparameters, batch size, learning rate, number of epochs, and hardware specifications
121
+ - **Best checkpoint selection criteria** explicitly defined using the validation set
122
+ - **Controlled evaluation procedure** with clearly specified evaluation script versions
123
+ - **Full documentation of execution conditions** (hardware, software versions, environment)
124
+ - **Explicit treatment of validity threats** including language-specific variability and training randomness
125
+
126
+ ***
127
+
128
+ ## Important Notes
129
+
130
+ - The fine-tuning procedure for MCMD-NL is **not a reuse of existing checkpoints** β€” both models were trained from their pre-trained weights on the MCMD-NL training partition.
131
+ - The original paper does not clarify whether a single multilingual checkpoint or separate per-language checkpoints were trained for MCMD-NL; this ambiguity is addressed and documented in the thesis.
132
+ - MCMD-NL scores are generally **higher than MCMD scores** across all metrics, likely due to the different commit style distributions across the new languages.
133
+
134
+ ***
135
+
136
+ ## References
137
+
138
+ - Wu et al. (2025). *An Empirical Study on Commit Message Generation with Large Language Models via In-Context Learning.* arXiv:2502.18904.
139
+ - Lin et al. (2023). *CCT5: A Code-Change-Oriented Pre-Trained Model.* ESEC/FSE 2023.
140
+ - He et al. (2023). *COME: Commit Message Generation with Modification Embedding.* ISSTA 2023.
141
+ - Vegas & Elbaum (2023). *Pitfalls in Experiments with DNN4SE.* ESEC/FSE 2023.