---
license: mit
base_model: Toflamus/GPT-2_para3M
tags:
- generated_from_trainer
model-index:
- name: Output
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Output

This model is a fine-tuned version of [Toflamus/GPT-2_para3M](https://huggingface.co/Toflamus/GPT-2_para3M) on an unknown dataset.
TrainOutput(global_step=4060, training_loss=6.123095868491187, metrics={'train_runtime': 1435.0504, 'train_samples_per_second': 181.185, 'train_steps_per_second': 2.829, 'total_flos': 96669633527808.0, 'train_loss': 6.123095868491187, 'epoch': 5.0})
## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 64
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 100
- num_epochs: 5

### Training results
Step	Training Loss  
100	7.737900  
200	7.066700  
300	6.840200  
400	6.686600  
500	6.607700  
600	6.516500  
700	6.449800  
800	6.360400  
900	6.321700  
1000	6.252700  
1100	6.223500  
1200	6.194700  
1300	6.131500  
1400	6.113400  
1500	6.106500  
1600	6.044100  
1700	6.024400  
1800	6.008500  
1900	6.006600  
2000	5.959900  
2100	5.931100  
2200	5.925300  
2300	5.933500  
2400	5.921900  
2500	5.913400  
2600	5.898100  
2700	5.874700  
2800	5.869100  
2900	5.851200  
3000	5.853900  
3100	5.870100   
3200	5.868100  
3300	5.837000  
3400	5.845300  
3500	5.828800  
3600	5.847400  
3700	5.858600  
3800	5.853200  
3900	5.836600  
4000	5.849100  


### Framework versions

- Transformers 4.32.0
- Pytorch 2.0.1+cu117
- Datasets 2.14.4
- Tokenizers 0.13.2