File size: 3,446 Bytes
f21cbc1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
---
license: mit
base_model: microsoft/phi-2
tags:
- generated_from_trainer
metrics:
- rouge
model-index:
- name: phi-2-coedit
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# phi-2-coedit

This model is a fine-tuned version of [microsoft/phi-2](https://huggingface.co/microsoft/phi-2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 0.7388
- Rouge1: 0.5206
- Rouge2: 0.4123
- Rougel: 0.4979
- Rougelsum: 0.5032
- Sacreblue: 28.1346
- Memory Used: 81917.5
- Cuda Allocated: 10795.7861
- Cuda Reserved: 74746.0
- Ram Usage: 24042.6719
- Em: 0.0
- Gen Len: 120.6545

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 35
- eval_batch_size: 35
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 140
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 1
- num_epochs: 2
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Sacreblue | Memory Used | Cuda Allocated | Cuda Reserved | Ram Usage  | Em  | Gen Len  |
|:-------------:|:-----:|:----:|:---------------:|:------:|:------:|:------:|:---------:|:---------:|:-----------:|:--------------:|:-------------:|:----------:|:---:|:--------:|
| 0.5716        | 0.22  | 100  | 0.7558          | 0.5041 | 0.3927 | 0.4809 | 0.4853    | 26.9798   | 81917.5     | 10795.811      | 74738.0       | 22888.4102 | 0.0 | 120.3347 |
| 0.5407        | 0.44  | 200  | 0.7404          | 0.5241 | 0.4171 | 0.5013 | 0.5068    | 27.6806   | 81917.5     | 10795.814      | 74738.0       | 23733.9805 | 0.0 | 120.8277 |
| 0.5324        | 0.66  | 300  | 0.7230          | 0.5176 | 0.4093 | 0.4947 | 0.5002    | 27.5145   | 81917.5     | 10795.8184     | 74738.0       | 23831.1484 | 0.0 | 120.576  |
| 0.5107        | 0.88  | 400  | 0.7161          | 0.5256 | 0.4167 | 0.5042 | 0.5092    | 28.1274   | 81917.5     | 10795.7935     | 74738.0       | 23891.7891 | 0.0 | 120.5225 |
| 0.4374        | 1.1   | 500  | 0.7495          | 0.5237 | 0.414  | 0.501  | 0.5059    | 28.0405   | 81917.5     | 10795.7861     | 74746.0       | 23922.043  | 0.0 | 120.3181 |
| 0.3515        | 1.32  | 600  | 0.7418          | 0.5216 | 0.4133 | 0.499  | 0.5049    | 28.0528   | 81917.5     | 10795.7832     | 74746.0       | 23973.8164 | 0.0 | 120.6453 |
| 0.3449        | 1.54  | 700  | 0.7386          | 0.5242 | 0.4163 | 0.5016 | 0.5075    | 28.3145   | 81917.5     | 10795.8066     | 74746.0       | 23950.1016 | 0.0 | 120.5367 |
| 0.3375        | 1.76  | 800  | 0.7354          | 0.5194 | 0.4124 | 0.4973 | 0.5025    | 28.0252   | 81917.5     | 10795.814      | 74746.0       | 23931.0    | 0.0 | 120.6476 |
| 0.3373        | 1.98  | 900  | 0.7388          | 0.5206 | 0.4123 | 0.4979 | 0.5032    | 28.1346   | 81917.5     | 10795.7861     | 74746.0       | 24042.6719 | 0.0 | 120.6545 |


### Framework versions

- Transformers 4.39.3
- Pytorch 2.2.2
- Datasets 2.18.0
- Tokenizers 0.15.2