Mir-2002
/

codet5p-google-style-docstrings

Model card Files Files and versions

Mir-2002 commited on Jun 25, 2025

Commit

3ce03a9

·

verified ·

1 Parent(s): 8b43562

Create README.md

Files changed (1) hide show

README.md +73 -0

README.md ADDED Viewed

	@@ -0,0 +1,73 @@

+---
+datasets:
+- Mir-2002/python-google-style-docstrings
+language:
+- en
+metrics:
+- bleu
+- rouge
+base_model:
+- Salesforce/codet5p-220m-bimodal
+pipeline_tag: summarization
+tags:
+- code
+---
+# Overview
+This is a fine tuned CodeT5+ (220m) bimodal model tuned on a dataset consisting of 59,000 Python code-docstring pairs. The docstrings are in Google style format.
+A google style docstring is formatted as follows:
+```
+<Description of the code>
+Args:
+<var1> (<data-type>) : <description of var1>
+<var2> (<data_type>) : <description of var2>
+Returns:
+<var3> (<data-type>) : <description of var3>
+Raises:
+<var4> (<data-type>) : <description of var4>
+```
+For more information on my dataset, please see the included referenced dataset.
+# Hyperparameters
+MAX_SOURCE_LENGTH = 256
+MAX_TARGET_LENGTH = 128
+BATCH_SIZE = 16
+NUM_EPOCHS = 35
+LEARNING_RATE = 3e-5
+GRADIENT_ACCUMULATION_STEPS = 4
+EARLY_STOPPING_PATIENCE = 2
+WEIGHT_DECAY = 0.01
+OPTIMIZER = ADAFACTOR
+LR_SCHEDULER = LINEAR
+# Loss
+On the 35th epoch, the model achieved the following loss:
+Epoch	Training Loss	Validation Loss
+26	1.001400	1.288712
+27	0.983600	1.284895
+28	0.961300	1.277680
+29	0.940600	1.275018
+30	0.933600	1.275621
+31	0.918200	1.270074
+32	0.904700	1.268874
+33	0.908800	1.268534
+34	0.900600	1.268240
+*35*	*0.894800*	*1.268536*
+# BLEU and ROUGE Scores
+==================================================
+EVALUATION RESULTS
+==================================================
+BLEU Score: 0.3540
+ROUGE-1: 0.5855
+ROUGE-2: 0.3946
+ROUGE-L: 0.5243