|
|
--- |
|
|
license: apache-2.0 |
|
|
library_name: peft |
|
|
tags: |
|
|
- trl |
|
|
- sft |
|
|
- generated_from_trainer |
|
|
base_model: mistralai/Mistral-7B-v0.1 |
|
|
model-index: |
|
|
- name: lc_full |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# lc_full |
|
|
|
|
|
This model is a fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on an unknown dataset. |
|
|
It achieves the following results on the evaluation set: |
|
|
- Loss: 1.8715 |
|
|
|
|
|
## Model description |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training and evaluation data |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 2e-05 |
|
|
- train_batch_size: 1 |
|
|
- eval_batch_size: 1 |
|
|
- seed: 42 |
|
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
|
- lr_scheduler_type: cosine |
|
|
- num_epochs: 50 |
|
|
|
|
|
### Training results |
|
|
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|
|:-------------:|:-----:|:-----:|:---------------:| |
|
|
| 1.7424 | 1.0 | 486 | 1.6914 | |
|
|
| 1.301 | 2.0 | 972 | 1.6780 | |
|
|
| 1.5718 | 3.0 | 1458 | 1.6743 | |
|
|
| 1.6632 | 4.0 | 1944 | 1.6793 | |
|
|
| 1.8588 | 5.0 | 2430 | 1.6794 | |
|
|
| 1.5308 | 6.0 | 2916 | 1.6894 | |
|
|
| 1.5776 | 7.0 | 3402 | 1.6985 | |
|
|
| 1.6394 | 8.0 | 3888 | 1.7073 | |
|
|
| 1.4696 | 9.0 | 4374 | 1.7187 | |
|
|
| 1.4191 | 10.0 | 4860 | 1.7298 | |
|
|
| 1.4776 | 11.0 | 5346 | 1.7414 | |
|
|
| 1.4767 | 12.0 | 5832 | 1.7512 | |
|
|
| 1.3546 | 13.0 | 6318 | 1.7731 | |
|
|
| 1.542 | 14.0 | 6804 | 1.7610 | |
|
|
| 1.3709 | 15.0 | 7290 | 1.7679 | |
|
|
| 1.3167 | 16.0 | 7776 | 1.7936 | |
|
|
| 1.3563 | 17.0 | 8262 | 1.8007 | |
|
|
| 1.4615 | 18.0 | 8748 | 1.8008 | |
|
|
| 1.511 | 19.0 | 9234 | 1.8068 | |
|
|
| 1.3145 | 20.0 | 9720 | 1.8232 | |
|
|
| 1.1285 | 21.0 | 10206 | 1.8204 | |
|
|
| 1.5045 | 22.0 | 10692 | 1.8204 | |
|
|
| 1.2697 | 23.0 | 11178 | 1.8453 | |
|
|
| 1.302 | 24.0 | 11664 | 1.8386 | |
|
|
| 1.4892 | 25.0 | 12150 | 1.8434 | |
|
|
| 1.5042 | 26.0 | 12636 | 1.8471 | |
|
|
| 1.1989 | 27.0 | 13122 | 1.8472 | |
|
|
| 1.2353 | 28.0 | 13608 | 1.8545 | |
|
|
| 1.145 | 29.0 | 14094 | 1.8560 | |
|
|
| 1.4146 | 30.0 | 14580 | 1.8612 | |
|
|
| 1.3598 | 31.0 | 15066 | 1.8611 | |
|
|
| 1.2659 | 32.0 | 15552 | 1.8695 | |
|
|
| 1.2085 | 33.0 | 16038 | 1.8631 | |
|
|
| 1.0623 | 34.0 | 16524 | 1.8679 | |
|
|
| 1.4594 | 35.0 | 17010 | 1.8694 | |
|
|
| 1.3038 | 36.0 | 17496 | 1.8685 | |
|
|
| 1.5902 | 37.0 | 17982 | 1.8695 | |
|
|
| 1.2771 | 38.0 | 18468 | 1.8709 | |
|
|
| 1.2738 | 39.0 | 18954 | 1.8698 | |
|
|
| 1.3209 | 40.0 | 19440 | 1.8707 | |
|
|
| 1.2578 | 41.0 | 19926 | 1.8709 | |
|
|
| 1.1108 | 42.0 | 20412 | 1.8717 | |
|
|
| 1.3264 | 43.0 | 20898 | 1.8711 | |
|
|
| 1.3152 | 44.0 | 21384 | 1.8709 | |
|
|
| 1.4287 | 45.0 | 21870 | 1.8709 | |
|
|
| 1.299 | 46.0 | 22356 | 1.8709 | |
|
|
| 1.2863 | 47.0 | 22842 | 1.8710 | |
|
|
| 1.1795 | 48.0 | 23328 | 1.8716 | |
|
|
| 1.27 | 49.0 | 23814 | 1.8719 | |
|
|
| 1.3156 | 50.0 | 24300 | 1.8715 | |
|
|
|
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- PEFT 0.11.1 |
|
|
- Transformers 4.41.2 |
|
|
- Pytorch 2.1.0+cu118 |
|
|
- Datasets 2.19.2 |
|
|
- Tokenizers 0.19.1 |