File size: 1,346 Bytes
34bbb5b
 
3aacb9a
 
 
 
 
 
 
 
 
34bbb5b
 
3aacb9a
34bbb5b
3aacb9a
34bbb5b
3aacb9a
34bbb5b
3aacb9a
 
34bbb5b
3aacb9a
558ec57
3aacb9a
 
 
 
 
34bbb5b
3aacb9a
34bbb5b
3aacb9a
 
 
 
 
 
 
 
 
34bbb5b
3aacb9a
34bbb5b
3aacb9a
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
---
library_name: transformers
tags:
- code
- code-generation
- codet5
- comment-generation
- seq2seq
language:
- en
base_model: Salesforce/codet5-small
---

# CodeT5-Small — Code Comment Generator

Fine-tuned [`Salesforce/codet5-small`](https://huggingface.co/Salesforce/codet5-small) on a filtered subset of CodeSearchNet to generate natural-language comments and docstrings from source code.

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("melfatihomran/codet5-small-code-comment-gen")
model: [melfatihomran/codet5-small-code-comment-gen](https://huggingface.co/melfatihomran/codet5-small-code-comment-gen)
code = "def add(a, b):\n    return a + b"
inputs = tokenizer(code, return_tensors="pt")
output = model.generate(**inputs, max_length=64, num_beams=4)
print(tokenizer.decode(output[0], skip_special_tokens=True))
```

## Training

| Parameter | Value |
|-----------|-------|
| Base model | Salesforce/codet5-small |
| Dataset | sentence-transformers/codesearchnet (pair) |
| Train / Val / Test | 8,000 / 1,000 / 1,000 |
| Epochs | 5 |
| Learning rate | 5e-5 |
| Batch size | 8 |
| Precision | fp16 (GPU) |

## Results

| Metric | Score |
|--------|-------|
| BLEU | 19.65 |
| ROUGE-1 | 41.11 |
| ROUGE-2 | 23.41 |
| ROUGE-L | 38.83 |
| Exact Match | 5.60% |