---
library_name: peft
license: mit
base_model: gpt2
tags:
- generated_from_trainer
model-index:
- name: Se124M10KInfPrompt_endtoken_ls
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Se124M10KInfPrompt_endtoken_ls

This model is a fine-tuned version of [gpt2](https://huggingface.co/gpt2) on an unknown dataset.
It achieves the following results on the evaluation set:
- Loss: 2.0494

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0005
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 32
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 200
- num_epochs: 50
- mixed_precision_training: Native AMP
- label_smoothing_factor: 0.1

### Training results

| Training Loss | Epoch | Step  | Validation Loss |
|:-------------:|:-----:|:-----:|:---------------:|
| 19.0863       | 1.0   | 267   | 2.1942          |
| 17.6413       | 2.0   | 534   | 2.1318          |
| 17.3454       | 3.0   | 801   | 2.1143          |
| 17.2455       | 4.0   | 1068  | 2.0979          |
| 17.112        | 5.0   | 1335  | 2.0918          |
| 17.0311       | 6.0   | 1602  | 2.0852          |
| 16.9714       | 7.0   | 1869  | 2.0805          |
| 16.8883       | 8.0   | 2136  | 2.0760          |
| 16.8675       | 9.0   | 2403  | 2.0727          |
| 16.8491       | 10.0  | 2670  | 2.0699          |
| 16.8653       | 11.0  | 2937  | 2.0698          |
| 16.7795       | 12.0  | 3204  | 2.0718          |
| 16.8033       | 13.0  | 3471  | 2.0635          |
| 16.7715       | 14.0  | 3738  | 2.0644          |
| 16.7677       | 15.0  | 4005  | 2.0632          |
| 16.7682       | 16.0  | 4272  | 2.0615          |
| 16.7473       | 17.0  | 4539  | 2.0598          |
| 16.7306       | 18.0  | 4806  | 2.0615          |
| 16.6896       | 19.0  | 5073  | 2.0586          |
| 16.7027       | 20.0  | 5340  | 2.0589          |
| 16.6991       | 21.0  | 5607  | 2.0581          |
| 16.6864       | 22.0  | 5874  | 2.0573          |
| 16.6749       | 23.0  | 6141  | 2.0562          |
| 16.6714       | 24.0  | 6408  | 2.0551          |
| 16.6603       | 25.0  | 6675  | 2.0546          |
| 16.6801       | 26.0  | 6942  | 2.0542          |
| 16.6263       | 27.0  | 7209  | 2.0541          |
| 16.6436       | 28.0  | 7476  | 2.0531          |
| 16.6471       | 29.0  | 7743  | 2.0523          |
| 16.6412       | 30.0  | 8010  | 2.0549          |
| 16.6017       | 31.0  | 8277  | 2.0529          |
| 16.6352       | 32.0  | 8544  | 2.0510          |
| 16.5937       | 33.0  | 8811  | 2.0522          |
| 16.6165       | 34.0  | 9078  | 2.0511          |
| 16.5961       | 35.0  | 9345  | 2.0518          |
| 16.5675       | 36.0  | 9612  | 2.0514          |
| 16.5565       | 37.0  | 9879  | 2.0499          |
| 16.6215       | 38.0  | 10146 | 2.0504          |
| 16.6133       | 39.0  | 10413 | 2.0505          |
| 16.5901       | 40.0  | 10680 | 2.0492          |
| 16.5841       | 41.0  | 10947 | 2.0500          |
| 16.5856       | 42.0  | 11214 | 2.0493          |
| 16.5775       | 43.0  | 11481 | 2.0494          |
| 16.5873       | 44.0  | 11748 | 2.0497          |
| 16.5285       | 45.0  | 12015 | 2.0494          |


### Framework versions

- PEFT 0.15.1
- Transformers 4.51.3
- Pytorch 2.6.0+cu118
- Datasets 3.5.0
- Tokenizers 0.21.1