Update README.md
Browse files
README.md
CHANGED
|
@@ -20,6 +20,7 @@ Jam-CGPT is a GPT2-like model that follows [jam](https://huggingface.co/apcl/jam
|
|
| 20 |
|d | dropout | 0.20 |
|
| 21 |
|r | learning rate | 3e-5 |
|
| 22 |
|y | weight decay | 1e-5 |
|
|
|
|
| 23 |
|
| 24 |
## Jam-CGPT 110 million parameters model
|
| 25 |
| Hyperparameter | Description | Value |
|
|
@@ -33,6 +34,7 @@ Jam-CGPT is a GPT2-like model that follows [jam](https://huggingface.co/apcl/jam
|
|
| 33 |
|d | dropout | 0.20 |
|
| 34 |
|r | learning rate | 3e-5 |
|
| 35 |
|y | weight decay | 1e-5 |
|
|
|
|
| 36 |
|
| 37 |
|
| 38 |
## Jam-CGPT 350 million parameters model
|
|
@@ -46,8 +48,9 @@ Jam-CGPT is a GPT2-like model that follows [jam](https://huggingface.co/apcl/jam
|
|
| 46 |
|a | accumulation steps | 32 |
|
| 47 |
|d | dropout | 0.20 |
|
| 48 |
|r | learning rate | 3e-5 |
|
| 49 |
-
|y | weight decay | 1e-5 |
|
|
|
|
| 50 |
|
| 51 |
- Note that you can adjust the batch size and accumulation steps based on your GPU memory. But, the batch size * accumulation steps should be 128.
|
| 52 |
- If you finetune your models with multiple GPUs, you can turn down accumulation steps. For example, if you finetune with 2 GPUs, you will need to half the accumulation steps.
|
| 53 |
-
|
|
|
|
| 20 |
|d | dropout | 0.20 |
|
| 21 |
|r | learning rate | 3e-5 |
|
| 22 |
|y | weight decay | 1e-5 |
|
| 23 |
+
|iter | number of iterations after pretraing | 757,000 |
|
| 24 |
|
| 25 |
## Jam-CGPT 110 million parameters model
|
| 26 |
| Hyperparameter | Description | Value |
|
|
|
|
| 34 |
|d | dropout | 0.20 |
|
| 35 |
|r | learning rate | 3e-5 |
|
| 36 |
|y | weight decay | 1e-5 |
|
| 37 |
+
|iter | number of iterations after pretraing | 762,000 |
|
| 38 |
|
| 39 |
|
| 40 |
## Jam-CGPT 350 million parameters model
|
|
|
|
| 48 |
|a | accumulation steps | 32 |
|
| 49 |
|d | dropout | 0.20 |
|
| 50 |
|r | learning rate | 3e-5 |
|
| 51 |
+
|y | weight decay | 1e-5 |
|
| 52 |
+
|iter | weight decay | 272,000 |
|
| 53 |
|
| 54 |
- Note that you can adjust the batch size and accumulation steps based on your GPU memory. But, the batch size * accumulation steps should be 128.
|
| 55 |
- If you finetune your models with multiple GPUs, you can turn down accumulation steps. For example, if you finetune with 2 GPUs, you will need to half the accumulation steps.
|
| 56 |
+
- We pretrained 38m and 110m models for 3 epochs.
|