--- license: mit base_model: Toflamus/GPT-2_para3M tags: - generated_from_trainer model-index: - name: Output results: [] --- # Output This model is a fine-tuned version of [Toflamus/GPT-2_para3M](https://huggingface.co/Toflamus/GPT-2_para3M) on an unknown dataset. TrainOutput(global_step=4060, training_loss=6.123095868491187, metrics={'train_runtime': 1435.0504, 'train_samples_per_second': 181.185, 'train_steps_per_second': 2.829, 'total_flos': 96669633527808.0, 'train_loss': 6.123095868491187, 'epoch': 5.0}) ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 2e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 100 - num_epochs: 5 ### Training results Step Training Loss 100 7.737900 200 7.066700 300 6.840200 400 6.686600 500 6.607700 600 6.516500 700 6.449800 800 6.360400 900 6.321700 1000 6.252700 1100 6.223500 1200 6.194700 1300 6.131500 1400 6.113400 1500 6.106500 1600 6.044100 1700 6.024400 1800 6.008500 1900 6.006600 2000 5.959900 2100 5.931100 2200 5.925300 2300 5.933500 2400 5.921900 2500 5.913400 2600 5.898100 2700 5.874700 2800 5.869100 2900 5.851200 3000 5.853900 3100 5.870100 3200 5.868100 3300 5.837000 3400 5.845300 3500 5.828800 3600 5.847400 3700 5.858600 3800 5.853200 3900 5.836600 4000 5.849100 ### Framework versions - Transformers 4.32.0 - Pytorch 2.0.1+cu117 - Datasets 2.14.4 - Tokenizers 0.13.2