|
|
--- |
|
|
license: mit |
|
|
base_model: Toflamus/GPT-2_para3M |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
model-index: |
|
|
- name: Output |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# Output |
|
|
|
|
|
This model is a fine-tuned version of [Toflamus/GPT-2_para3M](https://huggingface.co/Toflamus/GPT-2_para3M) on an unknown dataset. |
|
|
TrainOutput(global_step=4060, training_loss=6.123095868491187, metrics={'train_runtime': 1435.0504, 'train_samples_per_second': 181.185, 'train_steps_per_second': 2.829, 'total_flos': 96669633527808.0, 'train_loss': 6.123095868491187, 'epoch': 5.0}) |
|
|
## Model description |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training and evaluation data |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 2e-05 |
|
|
- train_batch_size: 8 |
|
|
- eval_batch_size: 8 |
|
|
- seed: 42 |
|
|
- gradient_accumulation_steps: 8 |
|
|
- total_train_batch_size: 64 |
|
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
|
- lr_scheduler_type: cosine |
|
|
- lr_scheduler_warmup_steps: 100 |
|
|
- num_epochs: 5 |
|
|
|
|
|
### Training results |
|
|
Step Training Loss |
|
|
100 7.737900 |
|
|
200 7.066700 |
|
|
300 6.840200 |
|
|
400 6.686600 |
|
|
500 6.607700 |
|
|
600 6.516500 |
|
|
700 6.449800 |
|
|
800 6.360400 |
|
|
900 6.321700 |
|
|
1000 6.252700 |
|
|
1100 6.223500 |
|
|
1200 6.194700 |
|
|
1300 6.131500 |
|
|
1400 6.113400 |
|
|
1500 6.106500 |
|
|
1600 6.044100 |
|
|
1700 6.024400 |
|
|
1800 6.008500 |
|
|
1900 6.006600 |
|
|
2000 5.959900 |
|
|
2100 5.931100 |
|
|
2200 5.925300 |
|
|
2300 5.933500 |
|
|
2400 5.921900 |
|
|
2500 5.913400 |
|
|
2600 5.898100 |
|
|
2700 5.874700 |
|
|
2800 5.869100 |
|
|
2900 5.851200 |
|
|
3000 5.853900 |
|
|
3100 5.870100 |
|
|
3200 5.868100 |
|
|
3300 5.837000 |
|
|
3400 5.845300 |
|
|
3500 5.828800 |
|
|
3600 5.847400 |
|
|
3700 5.858600 |
|
|
3800 5.853200 |
|
|
3900 5.836600 |
|
|
4000 5.849100 |
|
|
|
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.32.0 |
|
|
- Pytorch 2.0.1+cu117 |
|
|
- Datasets 2.14.4 |
|
|
- Tokenizers 0.13.2 |
|
|
|