chiayisu commited on
Commit
35a3e4b
·
1 Parent(s): 3b0c21f

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Jam-CGPT
2
+
3
+ Jam-CGPT is a GPT2-like model that follows [jam](https://huggingface.co/apcl/jam)'s pretraining procedure to pretrain models ranging from 38 million to 350 million parameters and finetuning with comments generated by GPT-3.5 and data size ranging from 170k to 2.15m.
4
+
5
+ ## Jam-CGPT Training Details
6
+ - We follow [jam](https://huggingface.co/apcl/jam)'s pretraining procedure and use the same data to pretrain our 38m, 110m and 350m parameters models.
7
+ - We finetune our Jam-CGPT with the summaries generated by GPT-3.5 and 4 different dataset size [Jam-CGPT dataset](https://huggingface.co/datasets/apcl/Jam-CGPT).
8
+ - We finetune our models for 3 epochs.
9
+ - Our [GitHub repo](https://github.com/apcl-research/Jam-CGPT) contains the code for reproduction using the same [data](https://huggingface.co/datasets/apcl/Jam-CGPT).
10
+
11
+ ## Jam-CGPT 38 million parameters model
12
+ | Hyperparameter | Description | Value |
13
+ | ----------- | ----------- |------------|
14
+ |e | embedding dimensions | 512 |
15
+ |L | number of layers | 4 |
16
+ |h | attention heads | 4 |
17
+ |c | block size / context length | 256 |
18
+ |b | batch size | 64 |
19
+ |a | accumulation steps | 2 |
20
+ |d | dropout | 0.20 |
21
+ |r | learning rate | 3e-5 |
22
+ |y | weight decay | 1e-5 |
23
+
24
+ ## Jam-CGPT 110 million parameters model
25
+ | Hyperparameter | Description | Value |
26
+ | ----------- | ----------- |------------|
27
+ |e | embedding dimensions | 768 |
28
+ |L | number of layers | 10|
29
+ |h | attention heads | 8 |
30
+ |c | block size / context length | 256 |
31
+ |b | batch size | 32 |
32
+ |a | accumulation steps | 4 |
33
+ |d | dropout | 0.20 |
34
+ |r | learning rate | 3e-5 |
35
+ |y | weight decay | 1e-5 |
36
+
37
+
38
+ ## Jam-CGPT 350 million parameters model
39
+ | Hyperparameter | Description | Value |
40
+ | ----------- | ----------- |------------|
41
+ |e | embedding dimensions | 1024 |
42
+ |L | number of layers | 24 |
43
+ |h | attention heads | 16 |
44
+ |c | block size / context length | 256 |
45
+ |b | batch size | 4 |
46
+ |a | accumulation steps | 32 |
47
+ |d | dropout | 0.20 |
48
+ |r | learning rate | 3e-5 |
49
+ |y | weight decay | 1e-5 |
50
+
51
+ - Note that you can adjust the batch size and accumulation steps based on your GPU memory. But, the batch size * accumulation steps should be 128.
52
+ - If you finetune your models with multiple GPUs, you can turn down accumulation steps. For example, if you finetune with 2 GPUs, you will need to half the accumulation steps.
53
+