clouds125 commited on
Commit
5977264
Β·
verified Β·
1 Parent(s): 9d1d203

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -26
README.md CHANGED
@@ -1,27 +1,27 @@
1
- # Experiment 1 – Latin Square 1: CCT5 & COME on MCMD
2
 
3
- This repository contains the artifacts for **Latin Square 1 of Experiment 1**, which corresponds to the **reproduction of the original experiment** by Wu et al. (2025) on the **MCMD dataset** using the DNN-based commit message generation baselines **CCT5** and **COME**.
4
 
5
  ***
6
 
7
  ## Models
8
 
9
  ### CCT5
10
- CCT5 is a code-change-oriented pre-trained model built on top of the **T5 architecture**, initialized from **CodeT5** weights. It is further specialized through pre-training on **CodeChangeNet**, a commit-diff dataset containing roughly 40GB of diff and commit message pairs (~1.5M pairs). It was released at ESEC/FSE 2023.
11
 
12
- - Base: `T5-base` β†’ `CodeT5` β†’ `CCT5`
13
- - Pre-training data: CodeChangeNet (40GB, 1.5M diff/commit pairs)
14
- - For MCMD: reused released checkpoint fine-tuned on MCMD by original authors
15
 
16
  ### COME
17
- COME (Commit Message Generation with Modification Embedding) is a hybrid DNN approach that combines:
18
- - A **fine-tuned CodeT5** component for natural language generation
19
- - **Modification embedding** to represent code changes as numerical vectors
20
- - An **SVM-based decision algorithm** to select between generated and retrieved candidate messages
21
 
22
- It does not perform additional large-scale pre-training on top of CodeT5. Released at ISSTA 2023.
23
 
24
- - For MCMD: reused language-specific checkpoints released by original COME authors (one per language)
25
 
26
  ***
27
 
@@ -42,12 +42,12 @@ It does not perform additional large-scale pre-training on top of CodeT5. Releas
42
 
43
  ## Repository Structure
44
 
45
- Each run folder corresponds to a **programming language** evaluated in this Latin Square:
46
 
47
  ```
48
- experiment1_ls1/
49
  β”œβ”€β”€ run_java/
50
- β”‚ β”œβ”€β”€ checkpoint/ # CCT5 and COME checkpoints fine-tuned on MCMD (Java)
51
  β”‚ β”œβ”€β”€ predictions/ # Generated commit messages on MCMD Java test set
52
  β”‚ └── metrics/ # BLEU, METEOR, ROUGE-L, CIDEr scores
53
  β”œβ”€β”€ run_cpp/
@@ -69,13 +69,13 @@ experiment1_ls1/
69
  ```
70
 
71
  ### `checkpoint/`
72
- Contains the model checkpoint files for CCT5 and COME reused from the original authors' repositories, fine-tuned on the MCMD training set for the corresponding language.
73
 
74
  ### `predictions/`
75
- Contains the generated commit messages produced by each model on the MCMD test set for the corresponding language, stored as `.txt` files with one prediction per line aligned to the reference messages.
76
 
77
  ### `metrics/`
78
- Contains the computed evaluation metric scores for each model-language combination. Metrics are calculated by comparing predictions against the reference messages in the MCMD test set.
79
 
80
  ***
81
 
@@ -84,11 +84,11 @@ Contains the computed evaluation metric scores for each model-language combinati
84
  | Metric | Description |
85
  |--------|-------------|
86
  | **BLEU** | Bilingual Evaluation Understudy β€” measures n-gram precision between generated and reference messages |
87
- | **METEOR** | Metric for Evaluation of Translation with Explicit Ordering β€” extends BLEU with recall, stemming, and synonym matching |
88
  | **ROUGE-L** | Recall-Oriented Understudy for Gisting Evaluation (LCS variant) β€” measures longest common subsequence overlap |
89
  | **CIDEr** | Consensus-based Image Description Evaluation β€” TF-IDF-weighted n-gram similarity against reference messages |
90
 
91
- ### Reported Results (Original Paper – Wu et al., 2025)
92
 
93
  | Language | Model | BLEU | METEOR | ROUGE-L | CIDEr |
94
  |----------|-------|------|--------|---------|-------|
@@ -105,14 +105,18 @@ Contains the computed evaluation metric scores for each model-language combinati
105
  | **Average** | **CCT5** | **15.96** | **14.26** | **24.33** | **0.95** |
106
  | **Average** | **COME** | **25.07** | **21.48** | **31.97** | **1.70** |
107
 
 
 
108
  ***
109
 
110
- ## Notes
 
 
111
 
112
- - Checkpoints were **reused** from the original authors' repositories; no retraining was performed for this dataset.
113
- - No random seeds or repeated runs were documented in the original experiment.
114
- - Results in this repository correspond to the **reproduction attempt** of the original reported values.
115
- - Discrepancies between reproduced and reported results are documented in the thesis.
116
 
117
  ***
118
 
@@ -121,4 +125,5 @@ Contains the computed evaluation metric scores for each model-language combinati
121
  - Wu et al. (2025). *An Empirical Study on Commit Message Generation with Large Language Models via In-Context Learning.* arXiv:2502.18904.
122
  - Lin et al. (2023). *CCT5: A Code-Change-Oriented Pre-Trained Model.* ESEC/FSE 2023.
123
  - He et al. (2023). *COME: Commit Message Generation with Modification Embedding.* ISSTA 2023.
124
- - Liu et al. (2020). *MCMD dataset.*
 
 
1
+ # Experiment 2 – Latin Square 1: CCT5 & COME on MCMD (Redesigned)
2
 
3
+ This repository contains the artifacts for **Latin Square 1 of Experiment 2**, which corresponds to the **redesigned and reimplemented experiment** evaluated on the **MCMD dataset** using the DNN-based commit message generation baselines **CCT5** and **COME**. This experiment was conducted under a more explicit and controlled evaluation protocol than the original study by Wu et al. (2025).
4
 
5
  ***
6
 
7
  ## Models
8
 
9
  ### CCT5
10
+ CCT5 is a code-change-oriented pre-trained model built on top of the **T5 architecture**, initialized from **CodeT5** weights and further pre-trained on **CodeChangeNet** (~40GB, 1.5M diff/commit pairs). Released at ESEC/FSE 2023.
11
 
12
+ - Architecture: Encoder-decoder Transformer (`T5-base` β†’ `CodeT5` β†’ `CCT5`)
13
+ - Pre-training corpus: CodeChangeNet (code diffs paired with commit messages)
14
+ - For MCMD: fine-tuned checkpoint from original CCT5 authors, trained on MCMD training set
15
 
16
  ### COME
17
+ COME (Commit Message Generation with Modification Embedding) is a hybrid DNN system built on top of CodeT5 with three core components:
18
+ - **Modification embedding**: converts code changes into numerical vectors capturing code evolution
19
+ - **Fine-tuned CodeT5**: generates candidate commit messages from the embedded representation
20
+ - **SVM-based decision algorithm**: selects between the generated and retrieved candidate messages
21
 
22
+ Released at ISSTA 2023. Does not include additional large-scale pre-training beyond CodeT5.
23
 
24
+ - For MCMD: language-specific checkpoints released by original COME authors (one per language)
25
 
26
  ***
27
 
 
42
 
43
  ## Repository Structure
44
 
45
+ Each run folder corresponds to a **programming language** evaluated in this Latin Square. Unlike Experiment 1, this experiment follows a more controlled protocol with explicit random seed documentation and multiple evaluation runs where applicable.
46
 
47
  ```
48
+ experiment2_ls1/
49
  β”œβ”€β”€ run_java/
50
+ β”‚ β”œβ”€β”€ checkpoint/ # CCT5 and COME checkpoints for MCMD (Java)
51
  β”‚ β”œβ”€β”€ predictions/ # Generated commit messages on MCMD Java test set
52
  β”‚ └── metrics/ # BLEU, METEOR, ROUGE-L, CIDEr scores
53
  β”œβ”€β”€ run_cpp/
 
69
  ```
70
 
71
  ### `checkpoint/`
72
+ Contains the model checkpoint files used for evaluation. For MCMD, these are the fine-tuned checkpoints released by the original authors of CCT5 and COME, trained on the MCMD training set for the corresponding language.
73
 
74
  ### `predictions/`
75
+ Contains the generated commit messages produced by each model on the MCMD test set for the corresponding language. Files are stored as `.txt` with one prediction per line, aligned to the reference messages in the test set.
76
 
77
  ### `metrics/`
78
+ Contains the evaluation metric scores computed by comparing the predictions against the MCMD test set reference messages. Each file records BLEU, METEOR, ROUGE-L, and CIDEr scores per model and language under the redesigned evaluation protocol.
79
 
80
  ***
81
 
 
84
  | Metric | Description |
85
  |--------|-------------|
86
  | **BLEU** | Bilingual Evaluation Understudy β€” measures n-gram precision between generated and reference messages |
87
+ | **METEOR** | Metric for Evaluation of Translation with Explicit Ordering β€” extends BLEU with recall, stemming, and synonym matching via WordNet |
88
  | **ROUGE-L** | Recall-Oriented Understudy for Gisting Evaluation (LCS variant) β€” measures longest common subsequence overlap |
89
  | **CIDEr** | Consensus-based Image Description Evaluation β€” TF-IDF-weighted n-gram similarity against reference messages |
90
 
91
+ ### Reference Results (Original Paper – Wu et al., 2025)
92
 
93
  | Language | Model | BLEU | METEOR | ROUGE-L | CIDEr |
94
  |----------|-------|------|--------|---------|-------|
 
105
  | **Average** | **CCT5** | **15.96** | **14.26** | **24.33** | **0.95** |
106
  | **Average** | **COME** | **25.07** | **21.48** | **31.97** | **1.70** |
107
 
108
+ These values serve as the baseline reference for comparison with the results produced under the redesigned protocol.
109
+
110
  ***
111
 
112
+ ## Methodological Differences from Experiment 1
113
+
114
+ This experiment was redesigned to address the validity and reproducibility concerns identified during the Experiment 1 reproduction phase:
115
 
116
+ - **Explicit random seed documentation** for all runs
117
+ - **Controlled evaluation procedure** with clearly specified script versions
118
+ - **Full documentation of execution conditions** (hardware, software versions, environment)
119
+ - **Explicit treatment of validity threats** at each stage of the evaluation
120
 
121
  ***
122
 
 
125
  - Wu et al. (2025). *An Empirical Study on Commit Message Generation with Large Language Models via In-Context Learning.* arXiv:2502.18904.
126
  - Lin et al. (2023). *CCT5: A Code-Change-Oriented Pre-Trained Model.* ESEC/FSE 2023.
127
  - He et al. (2023). *COME: Commit Message Generation with Modification Embedding.* ISSTA 2023.
128
+ - Liu et al. (2020). *MCMD dataset.*
129
+ - Vegas & Elbaum (2023). *Pitfalls in Experiments with DNN4SE.* ESEC/FSE 2023.