sixf0ur
/

ScentLLaMA

Text Generation

text-generation-inference

Model card Files Files and versions

sixf0ur commited on Aug 6, 2025

Commit

317e4f0

·

verified ·

1 Parent(s): 15607d7

Update README.md

Files changed (1) hide show

README.md +26 -1

README.md CHANGED Viewed

@@ -36,7 +36,32 @@ Training was performed for approximately **160,000 steps**.
 The evaluation loss remains consistently close to the training loss throughout training (within ~0.01),
 indicating that the model generalizes well and shows no signs of overfitting.
 ![Training loss](./loss.png)

 The evaluation loss remains consistently close to the training loss throughout training (within ~0.01),
 indicating that the model generalizes well and shows no signs of overfitting.
+Training arguments can be seen below:
+```python
+TRAINING_ARGS = TrainingArguments(
+    output_dir=OUTPUT_DIR,
+    overwrite_output_dir=True,
+    num_train_epochs=20,
+    per_device_train_batch_size=16,
+    per_device_eval_batch_size=16,
+    learning_rate=1e-4,
+    warmup_steps=500,
+    lr_scheduler_type="cosine",
+    weight_decay=0.01,
+    max_grad_norm=1.0,
+    logging_dir=os.path.join(OUTPUT_DIR, "logs"),
+    logging_steps=100,
+    save_steps=500,
+    eval_steps=500,
+    eval_strategy="steps",
+    load_best_model_at_end=True,
+    metric_for_best_model="eval_loss",
+    greater_is_better=False,
+    save_total_limit=2,
+    fp16=True,
+    report_to="tensorboard",
+)
+````
 ![Training loss](./loss.png)