ContextLM / contextlm_gpt2_base /README.md

daibeiya

model upload

4d46bcb verified 4 months ago

preview code

raw

history blame contribute delete

2.61 kB

metadata

library_name: transformers
base_model: /fs-computility/plm/linzhouhan/daibeiya/models/gpt2
tags:
  - generated_from_trainer
datasets:
  - openwebtext
model-index:
  - name: gpt2_base_contextlm_l0212_add_lnnorm_wodetach_v2_lr_bf16_lr1e-3
    results: []

gpt2_base_contextlm_l0212_add_lnnorm_wodetach_v2_lr_bf16_lr1e-3

This model is a fine-tuned version of /fs-computility/plm/linzhouhan/daibeiya/models/gpt2 on the openwebtext dataset. It achieves the following results on the evaluation set:

Loss: 3.0300

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 16
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 16
gradient_accumulation_steps: 2
total_train_batch_size: 512
total_eval_batch_size: 128
optimizer: Use adamw_torch with betas=(0.9,0.95) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.05
num_epochs: 1.0

Training results

Training Loss	Epoch	Step	Validation Loss
3.9616	0.0580	1000	3.8911
3.5512	0.1160	2000	3.4861
3.4279	0.1741	3000	3.3560
3.3471	0.2321	4000	3.2811
3.2957	0.2901	5000	3.2321
3.2677	0.3481	6000	3.1945
3.225	0.4062	7000	3.1653
3.2051	0.4642	8000	3.1390
3.1816	0.5222	9000	3.1161
3.1583	0.5802	10000	3.0971
3.1464	0.6383	11000	3.0794
3.1365	0.6963	12000	3.0645
3.1256	0.7543	13000	3.0509
3.1073	0.8123	14000	3.0417
3.108	0.8703	15000	3.0349
3.098	0.9284	16000	3.0312
3.092	0.9864	17000	3.0301

Framework versions

Transformers 4.51.3
Pytorch 2.3.0+cu121
Datasets 4.0.0
Tokenizers 0.21.4