sixf0ur commited on
Commit
317e4f0
·
verified ·
1 Parent(s): 15607d7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -1
README.md CHANGED
@@ -36,7 +36,32 @@ Training was performed for approximately **160,000 steps**.
36
 
37
  The evaluation loss remains consistently close to the training loss throughout training (within ~0.01),
38
  indicating that the model generalizes well and shows no signs of overfitting.
39
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  ![Training loss](./loss.png)
41
 
42
 
 
36
 
37
  The evaluation loss remains consistently close to the training loss throughout training (within ~0.01),
38
  indicating that the model generalizes well and shows no signs of overfitting.
39
+ Training arguments can be seen below:
40
+ ```python
41
+ TRAINING_ARGS = TrainingArguments(
42
+ output_dir=OUTPUT_DIR,
43
+ overwrite_output_dir=True,
44
+ num_train_epochs=20,
45
+ per_device_train_batch_size=16,
46
+ per_device_eval_batch_size=16,
47
+ learning_rate=1e-4,
48
+ warmup_steps=500,
49
+ lr_scheduler_type="cosine",
50
+ weight_decay=0.01,
51
+ max_grad_norm=1.0,
52
+ logging_dir=os.path.join(OUTPUT_DIR, "logs"),
53
+ logging_steps=100,
54
+ save_steps=500,
55
+ eval_steps=500,
56
+ eval_strategy="steps",
57
+ load_best_model_at_end=True,
58
+ metric_for_best_model="eval_loss",
59
+ greater_is_better=False,
60
+ save_total_limit=2,
61
+ fp16=True,
62
+ report_to="tensorboard",
63
+ )
64
+ ````
65
  ![Training loss](./loss.png)
66
 
67