Update README.md

7450db6 verified 8 days ago

5.03 kB

	# Experiment 1 – Latin Square 1: CCT5 & COME on MCMD

	This repository contains the artifacts for Latin Square 1 of Experiment 1, which corresponds to the reproduction of the original experiment by Wu et al. (2025) on the MCMD dataset using the DNN-based commit message generation baselines CCT5 and COME.

	***

	## Models

	### CCT5
	CCT5 is a code-change-oriented pre-trained model built on top of the T5 architecture, initialized from CodeT5 weights. It is further specialized through pre-training on CodeChangeNet, a commit-diff dataset containing roughly 40GB of diff and commit message pairs (~1.5M pairs). It was released at ESEC/FSE 2023.

	- Base: `T5-base` → `CodeT5` → `CCT5`
	- Pre-training data: CodeChangeNet (40GB, 1.5M diff/commit pairs)
	- For MCMD: reused released checkpoint fine-tuned on MCMD by original authors

	### COME
	COME (Commit Message Generation with Modification Embedding) is a hybrid DNN approach that combines:
	- A fine-tuned CodeT5 component for natural language generation
	- Modification embedding to represent code changes as numerical vectors
	- An SVM-based decision algorithm to select between generated and retrieved candidate messages

	It does not perform additional large-scale pre-training on top of CodeT5. Released at ISSTA 2023.

	- For MCMD: reused language-specific checkpoints released by original COME authors (one per language)

	***

	## Dataset

	MCMD – Multilingual Commit Message Dataset

	\| Property \| Details \|
	\|----------\|---------\|
	\| Languages \| Java, C++, C#, Python, JavaScript \|
	\| Repositories \| Top 100 most-starred GitHub repos per language (500 total) \|
	\| Total commits \| ~1,094,115 \|
	\| Date range \| Up to January 1st, 2022 \|
	\| Split \| 80% train / 10% validation / 10% test \|
	\| Authors \| Liu et al. (2020) \|

	***

	## Repository Structure

	Each run folder corresponds to a programming language evaluated in this Latin Square:

	```
	experiment1_ls1/
	├── run_java/
	│ ├── checkpoint/ # CCT5 and COME checkpoints fine-tuned on MCMD (Java)
	│ ├── predictions/ # Generated commit messages on MCMD Java test set
	│ └── metrics/ # BLEU, METEOR, ROUGE-L, CIDEr scores
	├── run_cpp/
	│ ├── checkpoint/
	│ ├── predictions/
	│ └── metrics/
	├── run_csharp/
	│ ├── checkpoint/
	│ ├── predictions/
	│ └── metrics/
	├── run_python/
	│ ├── checkpoint/
	│ ├── predictions/
	│ └── metrics/
	└── run_javascript/
	├── checkpoint/
	├── predictions/
	└── metrics/
	```

	### `checkpoint/`
	Contains the model checkpoint files for CCT5 and COME reused from the original authors' repositories, fine-tuned on the MCMD training set for the corresponding language.

	### `predictions/`
	Contains the generated commit messages produced by each model on the MCMD test set for the corresponding language, stored as `.txt` files with one prediction per line aligned to the reference messages.

	### `metrics/`
	Contains the computed evaluation metric scores for each model-language combination. Metrics are calculated by comparing predictions against the reference messages in the MCMD test set.

	***

	## Evaluation Metrics

	\| Metric \| Description \|
	\|--------\|-------------\|
	\| BLEU \| Bilingual Evaluation Understudy — measures n-gram precision between generated and reference messages \|
	\| METEOR \| Metric for Evaluation of Translation with Explicit Ordering — extends BLEU with recall, stemming, and synonym matching \|
	\| ROUGE-L \| Recall-Oriented Understudy for Gisting Evaluation (LCS variant) — measures longest common subsequence overlap \|
	\| CIDEr \| Consensus-based Image Description Evaluation — TF-IDF-weighted n-gram similarity against reference messages \|

	### Reported Results (Original Paper – Wu et al., 2025)

	\| Language \| Model \| BLEU \| METEOR \| ROUGE-L \| CIDEr \|
	\|----------\|-------\|------\|--------\|---------\|-------\|
	\| Java \| CCT5 \| 17.19 \| 14.95 \| 26.08 \| 1.06 \|
	\| Java \| COME \| 27.17 \| 23.36 \| 34.59 \| 1.90 \|
	\| C++ \| CCT5 \| 15.65 \| 14.11 \| 24.15 \| 0.90 \|
	\| C++ \| COME \| 27.29 \| 23.29 \| 33.33 \| 1.91 \|
	\| C# \| CCT5 \| 12.06 \| 11.05 \| 18.92 \| 0.61 \|
	\| C# \| COME \| 20.80 \| 17.72 \| 27.01 \| 1.25 \|
	\| Python \| CCT5 \| 15.12 \| 13.70 \| 23.79 \| 0.85 \|
	\| Python \| COME \| 23.17 \| 19.99 \| 30.48 \| 1.50 \|
	\| JavaScript \| CCT5 \| 19.76 \| 17.51 \| 28.73 \| 1.33 \|
	\| JavaScript \| COME \| 26.91 \| 23.02 \| 34.44 \| 1.92 \|
	\| Average \| CCT5 \| 15.96 \| 14.26 \| 24.33 \| 0.95 \|
	\| Average \| COME \| 25.07 \| 21.48 \| 31.97 \| 1.70 \|
	***

	## References

	- Wu et al. (2025). An Empirical Study on Commit Message Generation with Large Language Models via In-Context Learning. arXiv:2502.18904.
	- Lin et al. (2023). CCT5: A Code-Change-Oriented Pre-Trained Model. ESEC/FSE 2023.
	- He et al. (2023). COME: Commit Message Generation with Modification Embedding. ISSTA 2023.
	- Liu et al. (2020). MCMD dataset.