Shahradmz
/

Qwen2-0.5B-Instruct_continual_data_debug_CPPO_0

Generated from Trainer

Model card Files Files and versions

Shahradmz commited on May 1, 2025

Commit

01d327a

·

verified ·

1 Parent(s): 6348902

End of training

Files changed (3) hide show

README.md +1 -1
all_results.json +1 -1
eval_results.json +1 -1

README.md CHANGED Viewed

@@ -26,7 +26,7 @@ print(output["generated_text"])
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/shahrad_m/AIFGen-ppo-continual-test/runs/0z0efsfz)
 This model was trained with PPO, a method introduced in [Fine-Tuning Language Models from Human Preferences](https://huggingface.co/papers/1909.08593).

 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/shahrad_m/AIFGen-ppo-continual-test/runs/3pa7x10a)
 This model was trained with PPO, a method introduced in [Fine-Tuning Language Models from Human Preferences](https://huggingface.co/papers/1909.08593).

all_results.json CHANGED Viewed

@@ -1,4 +1,4 @@
 {
     "dataset": 0,
-    "eval_score": -2.289207696914673
 }

 {
     "dataset": 0,
+    "eval_score": 0.616264283657074
 }

eval_results.json CHANGED Viewed

@@ -1,4 +1,4 @@
 {
     "dataset": 0,
-    "eval_score": -2.289207696914673
 }

 {
     "dataset": 0,
+    "eval_score": 0.616264283657074
 }