Model save
Browse files- README.md +72 -0
- generation_config.json +10 -0
- model-00001-of-00003.safetensors +1 -1
- model-00002-of-00003.safetensors +1 -1
- model-00003-of-00003.safetensors +1 -1
- trainer_log.jsonl +55 -17
- training_args.bin +2 -2
README.md
ADDED
|
@@ -0,0 +1,72 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: llama2
|
| 3 |
+
base_model: meta-llama/Llama-2-7b-chat-hf
|
| 4 |
+
tags:
|
| 5 |
+
- llama-factory
|
| 6 |
+
- generated_from_trainer
|
| 7 |
+
model-index:
|
| 8 |
+
- name: prec_240807_llama2
|
| 9 |
+
results: []
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
| 13 |
+
should probably proofread and complete it, then remove this comment. -->
|
| 14 |
+
|
| 15 |
+
# prec_240807_llama2
|
| 16 |
+
|
| 17 |
+
This model is a fine-tuned version of [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) on an unknown dataset.
|
| 18 |
+
It achieves the following results on the evaluation set:
|
| 19 |
+
- Loss: 0.8686
|
| 20 |
+
|
| 21 |
+
## Model description
|
| 22 |
+
|
| 23 |
+
More information needed
|
| 24 |
+
|
| 25 |
+
## Intended uses & limitations
|
| 26 |
+
|
| 27 |
+
More information needed
|
| 28 |
+
|
| 29 |
+
## Training and evaluation data
|
| 30 |
+
|
| 31 |
+
More information needed
|
| 32 |
+
|
| 33 |
+
## Training procedure
|
| 34 |
+
|
| 35 |
+
### Training hyperparameters
|
| 36 |
+
|
| 37 |
+
The following hyperparameters were used during training:
|
| 38 |
+
- learning_rate: 1e-05
|
| 39 |
+
- train_batch_size: 1
|
| 40 |
+
- eval_batch_size: 1
|
| 41 |
+
- seed: 42
|
| 42 |
+
- distributed_type: multi-GPU
|
| 43 |
+
- num_devices: 2
|
| 44 |
+
- gradient_accumulation_steps: 2
|
| 45 |
+
- total_train_batch_size: 4
|
| 46 |
+
- total_eval_batch_size: 2
|
| 47 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
| 48 |
+
- lr_scheduler_type: cosine
|
| 49 |
+
- lr_scheduler_warmup_ratio: 0.1
|
| 50 |
+
- num_epochs: 1.0
|
| 51 |
+
|
| 52 |
+
### Training results
|
| 53 |
+
|
| 54 |
+
| Training Loss | Epoch | Step | Validation Loss |
|
| 55 |
+
|:-------------:|:------:|:----:|:---------------:|
|
| 56 |
+
| 0.9956 | 0.1111 | 50 | 0.9926 |
|
| 57 |
+
| 0.9989 | 0.2222 | 100 | 0.9736 |
|
| 58 |
+
| 0.9373 | 0.3333 | 150 | 0.9539 |
|
| 59 |
+
| 1.0155 | 0.4444 | 200 | 0.9442 |
|
| 60 |
+
| 0.9273 | 0.5556 | 250 | 0.9123 |
|
| 61 |
+
| 0.8789 | 0.6667 | 300 | 0.8964 |
|
| 62 |
+
| 0.871 | 0.7778 | 350 | 0.8800 |
|
| 63 |
+
| 0.8756 | 0.8889 | 400 | 0.8700 |
|
| 64 |
+
| 0.9423 | 1.0 | 450 | 0.8686 |
|
| 65 |
+
|
| 66 |
+
|
| 67 |
+
### Framework versions
|
| 68 |
+
|
| 69 |
+
- Transformers 4.43.3
|
| 70 |
+
- Pytorch 2.4.0+cu121
|
| 71 |
+
- Datasets 2.20.0
|
| 72 |
+
- Tokenizers 0.19.1
|
generation_config.json
ADDED
|
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{
|
| 2 |
+
"bos_token_id": 1,
|
| 3 |
+
"do_sample": true,
|
| 4 |
+
"eos_token_id": 2,
|
| 5 |
+
"max_length": 4096,
|
| 6 |
+
"pad_token_id": 0,
|
| 7 |
+
"temperature": 0.6,
|
| 8 |
+
"top_p": 0.9,
|
| 9 |
+
"transformers_version": "4.43.3"
|
| 10 |
+
}
|
model-00001-of-00003.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 4938985352
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:4fe442f5bc399aa9600515f6591411c4d1c8365511f6a018c5dcbb79b59425cd
|
| 3 |
size 4938985352
|
model-00002-of-00003.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 4947390880
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:2a8557259341c942d1246dd9cb570bec148ef3ebc7eb6eb78b224bd5345db12d
|
| 3 |
size 4947390880
|
model-00003-of-00003.safetensors
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
size 3590488816
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:7370a04989d3ac8c77671e0b4e9a016395c1923d5c1eec8ff9bc3e569e7a91fb
|
| 3 |
size 3590488816
|
trainer_log.jsonl
CHANGED
|
@@ -1,17 +1,55 @@
|
|
| 1 |
-
{"current_steps": 10, "total_steps":
|
| 2 |
-
{"current_steps": 20, "total_steps":
|
| 3 |
-
{"current_steps": 30, "total_steps":
|
| 4 |
-
{"current_steps": 40, "total_steps":
|
| 5 |
-
{"current_steps": 50, "total_steps":
|
| 6 |
-
{"current_steps":
|
| 7 |
-
{"current_steps":
|
| 8 |
-
{"current_steps":
|
| 9 |
-
{"current_steps":
|
| 10 |
-
{"current_steps":
|
| 11 |
-
{"current_steps": 100, "total_steps":
|
| 12 |
-
{"current_steps":
|
| 13 |
-
{"current_steps":
|
| 14 |
-
{"current_steps":
|
| 15 |
-
{"current_steps":
|
| 16 |
-
{"current_steps":
|
| 17 |
-
{"current_steps":
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
{"current_steps": 10, "total_steps": 450, "loss": 1.4236, "learning_rate": 2.222222222222222e-06, "epoch": 0.022222222222222223, "percentage": 2.22, "elapsed_time": "0:00:20", "remaining_time": "0:15:08", "throughput": "0.00", "total_tokens": 0}
|
| 2 |
+
{"current_steps": 20, "total_steps": 450, "loss": 1.1482, "learning_rate": 4.444444444444444e-06, "epoch": 0.044444444444444446, "percentage": 4.44, "elapsed_time": "0:00:40", "remaining_time": "0:14:28", "throughput": "0.00", "total_tokens": 0}
|
| 3 |
+
{"current_steps": 30, "total_steps": 450, "loss": 0.9179, "learning_rate": 6.666666666666667e-06, "epoch": 0.06666666666666667, "percentage": 6.67, "elapsed_time": "0:01:00", "remaining_time": "0:14:02", "throughput": "0.00", "total_tokens": 0}
|
| 4 |
+
{"current_steps": 40, "total_steps": 450, "loss": 0.9949, "learning_rate": 8.888888888888888e-06, "epoch": 0.08888888888888889, "percentage": 8.89, "elapsed_time": "0:01:19", "remaining_time": "0:13:38", "throughput": "0.00", "total_tokens": 0}
|
| 5 |
+
{"current_steps": 50, "total_steps": 450, "loss": 0.9956, "learning_rate": 9.996239762521152e-06, "epoch": 0.1111111111111111, "percentage": 11.11, "elapsed_time": "0:01:39", "remaining_time": "0:13:16", "throughput": "0.00", "total_tokens": 0}
|
| 6 |
+
{"current_steps": 50, "total_steps": 450, "eval_loss": 0.9926177859306335, "epoch": 0.1111111111111111, "percentage": 11.11, "elapsed_time": "0:02:09", "remaining_time": "0:17:19", "throughput": "0.00", "total_tokens": 0}
|
| 7 |
+
{"current_steps": 60, "total_steps": 450, "loss": 1.052, "learning_rate": 9.966191788709716e-06, "epoch": 0.13333333333333333, "percentage": 13.33, "elapsed_time": "0:02:29", "remaining_time": "0:16:12", "throughput": "0.00", "total_tokens": 0}
|
| 8 |
+
{"current_steps": 70, "total_steps": 450, "loss": 0.9701, "learning_rate": 9.906276553136924e-06, "epoch": 0.15555555555555556, "percentage": 15.56, "elapsed_time": "0:02:49", "remaining_time": "0:15:19", "throughput": "0.00", "total_tokens": 0}
|
| 9 |
+
{"current_steps": 80, "total_steps": 450, "loss": 1.0382, "learning_rate": 9.816854393079402e-06, "epoch": 0.17777777777777778, "percentage": 17.78, "elapsed_time": "0:03:09", "remaining_time": "0:14:34", "throughput": "0.00", "total_tokens": 0}
|
| 10 |
+
{"current_steps": 90, "total_steps": 450, "loss": 0.9353, "learning_rate": 9.698463103929542e-06, "epoch": 0.2, "percentage": 20.0, "elapsed_time": "0:03:28", "remaining_time": "0:13:55", "throughput": "0.00", "total_tokens": 0}
|
| 11 |
+
{"current_steps": 100, "total_steps": 450, "loss": 0.9989, "learning_rate": 9.551814704830734e-06, "epoch": 0.2222222222222222, "percentage": 22.22, "elapsed_time": "0:03:48", "remaining_time": "0:13:20", "throughput": "0.00", "total_tokens": 0}
|
| 12 |
+
{"current_steps": 100, "total_steps": 450, "eval_loss": 0.9735559225082397, "epoch": 0.2222222222222222, "percentage": 22.22, "elapsed_time": "0:04:18", "remaining_time": "0:15:06", "throughput": "0.00", "total_tokens": 0}
|
| 13 |
+
{"current_steps": 110, "total_steps": 450, "loss": 0.9845, "learning_rate": 9.377791156510456e-06, "epoch": 0.24444444444444444, "percentage": 24.44, "elapsed_time": "0:04:38", "remaining_time": "0:14:21", "throughput": "0.00", "total_tokens": 0}
|
| 14 |
+
{"current_steps": 120, "total_steps": 450, "loss": 0.9919, "learning_rate": 9.177439057064684e-06, "epoch": 0.26666666666666666, "percentage": 26.67, "elapsed_time": "0:04:58", "remaining_time": "0:13:40", "throughput": "0.00", "total_tokens": 0}
|
| 15 |
+
{"current_steps": 130, "total_steps": 450, "loss": 0.9697, "learning_rate": 8.951963347593797e-06, "epoch": 0.28888888888888886, "percentage": 28.89, "elapsed_time": "0:05:18", "remaining_time": "0:13:02", "throughput": "0.00", "total_tokens": 0}
|
| 16 |
+
{"current_steps": 140, "total_steps": 450, "loss": 0.9355, "learning_rate": 8.702720065545024e-06, "epoch": 0.3111111111111111, "percentage": 31.11, "elapsed_time": "0:05:37", "remaining_time": "0:12:27", "throughput": "0.00", "total_tokens": 0}
|
| 17 |
+
{"current_steps": 150, "total_steps": 450, "loss": 0.9373, "learning_rate": 8.43120818934367e-06, "epoch": 0.3333333333333333, "percentage": 33.33, "elapsed_time": "0:05:57", "remaining_time": "0:11:55", "throughput": "0.00", "total_tokens": 0}
|
| 18 |
+
{"current_steps": 150, "total_steps": 450, "eval_loss": 0.9538707137107849, "epoch": 0.3333333333333333, "percentage": 33.33, "elapsed_time": "0:06:27", "remaining_time": "0:12:55", "throughput": "0.00", "total_tokens": 0}
|
| 19 |
+
{"current_steps": 160, "total_steps": 450, "loss": 0.9219, "learning_rate": 8.139060623360494e-06, "epoch": 0.35555555555555557, "percentage": 35.56, "elapsed_time": "0:06:47", "remaining_time": "0:12:18", "throughput": "0.00", "total_tokens": 0}
|
| 20 |
+
{"current_steps": 170, "total_steps": 450, "loss": 0.9378, "learning_rate": 7.828034377432694e-06, "epoch": 0.37777777777777777, "percentage": 37.78, "elapsed_time": "0:07:07", "remaining_time": "0:11:43", "throughput": "0.00", "total_tokens": 0}
|
| 21 |
+
{"current_steps": 180, "total_steps": 450, "loss": 1.0185, "learning_rate": 7.500000000000001e-06, "epoch": 0.4, "percentage": 40.0, "elapsed_time": "0:07:27", "remaining_time": "0:11:10", "throughput": "0.00", "total_tokens": 0}
|
| 22 |
+
{"current_steps": 190, "total_steps": 450, "loss": 0.9562, "learning_rate": 7.156930328406268e-06, "epoch": 0.4222222222222222, "percentage": 42.22, "elapsed_time": "0:07:46", "remaining_time": "0:10:38", "throughput": "0.00", "total_tokens": 0}
|
| 23 |
+
{"current_steps": 200, "total_steps": 450, "loss": 1.0155, "learning_rate": 6.800888624023552e-06, "epoch": 0.4444444444444444, "percentage": 44.44, "elapsed_time": "0:08:06", "remaining_time": "0:10:08", "throughput": "0.00", "total_tokens": 0}
|
| 24 |
+
{"current_steps": 200, "total_steps": 450, "eval_loss": 0.9441996812820435, "epoch": 0.4444444444444444, "percentage": 44.44, "elapsed_time": "0:08:36", "remaining_time": "0:10:46", "throughput": "0.00", "total_tokens": 0}
|
| 25 |
+
{"current_steps": 210, "total_steps": 450, "loss": 0.95, "learning_rate": 6.434016163555452e-06, "epoch": 0.4666666666666667, "percentage": 46.67, "elapsed_time": "0:08:56", "remaining_time": "0:10:13", "throughput": "0.00", "total_tokens": 0}
|
| 26 |
+
{"current_steps": 220, "total_steps": 450, "loss": 0.8331, "learning_rate": 6.058519361147055e-06, "epoch": 0.4888888888888889, "percentage": 48.89, "elapsed_time": "0:09:16", "remaining_time": "0:09:41", "throughput": "0.00", "total_tokens": 0}
|
| 27 |
+
{"current_steps": 230, "total_steps": 450, "loss": 0.9847, "learning_rate": 5.6766564987506564e-06, "epoch": 0.5111111111111111, "percentage": 51.11, "elapsed_time": "0:09:36", "remaining_time": "0:09:11", "throughput": "0.00", "total_tokens": 0}
|
| 28 |
+
{"current_steps": 240, "total_steps": 450, "loss": 0.9506, "learning_rate": 5.290724144552379e-06, "epoch": 0.5333333333333333, "percentage": 53.33, "elapsed_time": "0:09:55", "remaining_time": "0:08:41", "throughput": "0.00", "total_tokens": 0}
|
| 29 |
+
{"current_steps": 250, "total_steps": 450, "loss": 0.9273, "learning_rate": 4.903043341140879e-06, "epoch": 0.5555555555555556, "percentage": 55.56, "elapsed_time": "0:10:15", "remaining_time": "0:08:12", "throughput": "0.00", "total_tokens": 0}
|
| 30 |
+
{"current_steps": 250, "total_steps": 450, "eval_loss": 0.9123120903968811, "epoch": 0.5555555555555556, "percentage": 55.56, "elapsed_time": "0:10:45", "remaining_time": "0:08:36", "throughput": "0.00", "total_tokens": 0}
|
| 31 |
+
{"current_steps": 260, "total_steps": 450, "loss": 0.932, "learning_rate": 4.515945646484105e-06, "epoch": 0.5777777777777777, "percentage": 57.78, "elapsed_time": "0:11:05", "remaining_time": "0:08:06", "throughput": "0.00", "total_tokens": 0}
|
| 32 |
+
{"current_steps": 270, "total_steps": 450, "loss": 0.9178, "learning_rate": 4.131759111665349e-06, "epoch": 0.6, "percentage": 60.0, "elapsed_time": "0:11:25", "remaining_time": "0:07:36", "throughput": "0.00", "total_tokens": 0}
|
| 33 |
+
{"current_steps": 280, "total_steps": 450, "loss": 0.8897, "learning_rate": 3.752794279710094e-06, "epoch": 0.6222222222222222, "percentage": 62.22, "elapsed_time": "0:11:45", "remaining_time": "0:07:08", "throughput": "0.00", "total_tokens": 0}
|
| 34 |
+
{"current_steps": 290, "total_steps": 450, "loss": 0.9125, "learning_rate": 3.3813302897083955e-06, "epoch": 0.6444444444444445, "percentage": 64.44, "elapsed_time": "0:12:04", "remaining_time": "0:06:39", "throughput": "0.00", "total_tokens": 0}
|
| 35 |
+
{"current_steps": 300, "total_steps": 450, "loss": 0.8789, "learning_rate": 3.019601169804216e-06, "epoch": 0.6666666666666666, "percentage": 66.67, "elapsed_time": "0:12:24", "remaining_time": "0:06:12", "throughput": "0.00", "total_tokens": 0}
|
| 36 |
+
{"current_steps": 300, "total_steps": 450, "eval_loss": 0.896363377571106, "epoch": 0.6666666666666666, "percentage": 66.67, "elapsed_time": "0:12:54", "remaining_time": "0:06:27", "throughput": "0.00", "total_tokens": 0}
|
| 37 |
+
{"current_steps": 310, "total_steps": 450, "loss": 0.8929, "learning_rate": 2.6697824014873076e-06, "epoch": 0.6888888888888889, "percentage": 68.89, "elapsed_time": "0:13:14", "remaining_time": "0:05:58", "throughput": "0.00", "total_tokens": 0}
|
| 38 |
+
{"current_steps": 320, "total_steps": 450, "loss": 0.8715, "learning_rate": 2.333977835991545e-06, "epoch": 0.7111111111111111, "percentage": 71.11, "elapsed_time": "0:13:34", "remaining_time": "0:05:30", "throughput": "0.00", "total_tokens": 0}
|
| 39 |
+
{"current_steps": 330, "total_steps": 450, "loss": 0.8109, "learning_rate": 2.0142070414860704e-06, "epoch": 0.7333333333333333, "percentage": 73.33, "elapsed_time": "0:13:54", "remaining_time": "0:05:03", "throughput": "0.00", "total_tokens": 0}
|
| 40 |
+
{"current_steps": 340, "total_steps": 450, "loss": 0.8895, "learning_rate": 1.7123931571546826e-06, "epoch": 0.7555555555555555, "percentage": 75.56, "elapsed_time": "0:14:13", "remaining_time": "0:04:36", "throughput": "0.00", "total_tokens": 0}
|
| 41 |
+
{"current_steps": 350, "total_steps": 450, "loss": 0.871, "learning_rate": 1.4303513272105057e-06, "epoch": 0.7777777777777778, "percentage": 77.78, "elapsed_time": "0:14:33", "remaining_time": "0:04:09", "throughput": "0.00", "total_tokens": 0}
|
| 42 |
+
{"current_steps": 350, "total_steps": 450, "eval_loss": 0.8800399303436279, "epoch": 0.7777777777777778, "percentage": 77.78, "elapsed_time": "0:15:03", "remaining_time": "0:04:18", "throughput": "0.00", "total_tokens": 0}
|
| 43 |
+
{"current_steps": 360, "total_steps": 450, "loss": 0.8507, "learning_rate": 1.1697777844051105e-06, "epoch": 0.8, "percentage": 80.0, "elapsed_time": "0:15:23", "remaining_time": "0:03:50", "throughput": "0.00", "total_tokens": 0}
|
| 44 |
+
{"current_steps": 370, "total_steps": 450, "loss": 0.8813, "learning_rate": 9.322396486851626e-07, "epoch": 0.8222222222222222, "percentage": 82.22, "elapsed_time": "0:15:43", "remaining_time": "0:03:23", "throughput": "0.00", "total_tokens": 0}
|
| 45 |
+
{"current_steps": 380, "total_steps": 450, "loss": 0.8444, "learning_rate": 7.191655023486682e-07, "epoch": 0.8444444444444444, "percentage": 84.44, "elapsed_time": "0:16:03", "remaining_time": "0:02:57", "throughput": "0.00", "total_tokens": 0}
|
| 46 |
+
{"current_steps": 390, "total_steps": 450, "loss": 0.897, "learning_rate": 5.318367983829393e-07, "epoch": 0.8666666666666667, "percentage": 86.67, "elapsed_time": "0:16:22", "remaining_time": "0:02:31", "throughput": "0.00", "total_tokens": 0}
|
| 47 |
+
{"current_steps": 400, "total_steps": 450, "loss": 0.8756, "learning_rate": 3.7138015365554834e-07, "epoch": 0.8888888888888888, "percentage": 88.89, "elapsed_time": "0:16:42", "remaining_time": "0:02:05", "throughput": "0.00", "total_tokens": 0}
|
| 48 |
+
{"current_steps": 400, "total_steps": 450, "eval_loss": 0.8700103759765625, "epoch": 0.8888888888888888, "percentage": 88.89, "elapsed_time": "0:17:12", "remaining_time": "0:02:09", "throughput": "0.00", "total_tokens": 0}
|
| 49 |
+
{"current_steps": 410, "total_steps": 450, "loss": 0.8607, "learning_rate": 2.3876057330792344e-07, "epoch": 0.9111111111111111, "percentage": 91.11, "elapsed_time": "0:17:32", "remaining_time": "0:01:42", "throughput": "0.00", "total_tokens": 0}
|
| 50 |
+
{"current_steps": 420, "total_steps": 450, "loss": 0.9037, "learning_rate": 1.3477564710088097e-07, "epoch": 0.9333333333333333, "percentage": 93.33, "elapsed_time": "0:17:52", "remaining_time": "0:01:16", "throughput": "0.00", "total_tokens": 0}
|
| 51 |
+
{"current_steps": 430, "total_steps": 450, "loss": 0.8203, "learning_rate": 6.005075261595495e-08, "epoch": 0.9555555555555556, "percentage": 95.56, "elapsed_time": "0:18:12", "remaining_time": "0:00:50", "throughput": "0.00", "total_tokens": 0}
|
| 52 |
+
{"current_steps": 440, "total_steps": 450, "loss": 0.9092, "learning_rate": 1.5035294161039882e-08, "epoch": 0.9777777777777777, "percentage": 97.78, "elapsed_time": "0:18:31", "remaining_time": "0:00:25", "throughput": "0.00", "total_tokens": 0}
|
| 53 |
+
{"current_steps": 450, "total_steps": 450, "loss": 0.9423, "learning_rate": 0.0, "epoch": 1.0, "percentage": 100.0, "elapsed_time": "0:18:51", "remaining_time": "0:00:00", "throughput": "0.00", "total_tokens": 0}
|
| 54 |
+
{"current_steps": 450, "total_steps": 450, "eval_loss": 0.8686431646347046, "epoch": 1.0, "percentage": 100.0, "elapsed_time": "0:19:21", "remaining_time": "0:00:00", "throughput": "0.00", "total_tokens": 0}
|
| 55 |
+
{"current_steps": 450, "total_steps": 450, "epoch": 1.0, "percentage": 100.0, "elapsed_time": "0:19:21", "remaining_time": "0:00:00", "throughput": "0.00", "total_tokens": 0}
|
training_args.bin
CHANGED
|
@@ -1,3 +1,3 @@
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
-
oid sha256:
|
| 3 |
-
size
|
|
|
|
| 1 |
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:328725cb1f025524ff816b99cd48e9abc88fc237b102e7e321789c4f928342d6
|
| 3 |
+
size 7224
|