Mir-2002
/

codet5p-google-style-docstrings

Model card Files Files and versions

codet5p-google-style-docstrings / README.md

Mir-2002's picture

Update README.md

b1a76c5 verified 9 months ago

|

2.07 kB

	---
	datasets:
	- Mir-2002/python-google-style-docstrings
	language:
	- en
	metrics:
	- bleu
	- rouge
	base_model:
	- Salesforce/codet5p-220m-bimodal
	pipeline_tag: summarization
	tags:
	- code
	---

	# Overview

	This is a fine tuned CodeT5+ (220m) bimodal model tuned on a dataset consisting of 59,000 Python code-docstring pairs. The docstrings are in Google style format.
	A google style docstring is formatted as follows:
	```
	<Description of the code>

	Args:
	<var1> (<data-type>) : <description of var1>
	<var2> (<data_type>) : <description of var2>

	Returns:
	<var3> (<data-type>) : <description of var3>

	Raises:
	<var4> (<data-type>) : <description of var4>
	```

	For more information on my dataset, please see the included referenced dataset.

	You can test the model using this:

	```python
	from transformers import T5ForConditionalGeneration, AutoTokenizer

	checkpoint = "Mir-2002/codet5p-google-style-docstrings"
	device = "cuda" # or CPU

	tokenizer = AutoTokenizer.from_pretrained(checkpoint)
	model = T5ForConditionalGeneration.from_pretrained(checkpoint).to(device)

	input = """
	def calculate_sum(a, b):
	return a + b
	"""

	inputs = tokenizer.encode(input, return_tensors="pt").to(device)
	outputs = model.generate(inputs, max_length=128)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	# Calculate the sum of two numbers.

	# Args:
	# a (int): The first number.
	# b (int): The second number.

	```

	# Hyperparameters

	MAX_SOURCE_LENGTH = 256 <br>
	MAX_TARGET_LENGTH = 128 <br>
	BATCH_SIZE = 16 <br>
	NUM_EPOCHS = 35 <br>
	LEARNING_RATE = 3e-5 <br>
	GRADIENT_ACCUMULATION_STEPS = 4 <br>
	EARLY_STOPPING_PATIENCE = 2 <br>
	WEIGHT_DECAY = 0.01 <br>
	OPTIMIZER = ADAFACTOR <br>
	LR_SCHEDULER = LINEAR <br>

	# Loss

	On the 35th epoch, the model achieved the following loss:

	\| Epoch \| Training Loss \| Validation Loss \|
	\| ----------- \| ----------- \| ----------- \|
	\| 35 \| 0.894800 \| 1.268536


	# BLEU and ROUGE Scores

	\| BLEU \| ROUGE-1 \| ROUGE-2 \| ROUGE-L
	\| ----------- \| ----------- \| ----------- \| ----------- \|
	\| 35.40 \| 58.55 \| 39.46 \| 52.43 \|