Training in progress, step 10

Files changed (4) hide show

README.md CHANGED Viewed

@@ -1,15 +1,15 @@
 ---
 base_model: Qwen/Qwen3-0.6B
 library_name: transformers
-model_name: sft
 tags:
 - generated_from_trainer
 - trl
-- sft
 licence: license
 ---
-# Model Card for sft
 This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B).
 It has been trained using [TRL](https://github.com/huggingface/trl).
@@ -20,17 +20,17 @@ It has been trained using [TRL](https://github.com/huggingface/trl).
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
-generator = pipeline("text-generation", model="SaminSkyfall/sft", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
-[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/samin-skyfall-ai/huggingface/runs/my5dpy2e)
-This model was trained with SFT.
 ### Framework versions

 ---
 base_model: Qwen/Qwen3-0.6B
 library_name: transformers
+model_name: reward
 tags:
 - generated_from_trainer
 - trl
+- reward-trainer
 licence: license
 ---
+# Model Card for reward
 This model is a fine-tuned version of [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B).
 It has been trained using [TRL](https://github.com/huggingface/trl).
 from transformers import pipeline
 question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?"
+generator = pipeline("text-generation", model="SaminSkyfall/reward", device="cuda")
 output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0]
 print(output["generated_text"])
 ```
 ## Training procedure
+[<img src="https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg" alt="Visualize in Weights & Biases" width="150" height="24"/>](https://wandb.ai/samin-skyfall-ai/huggingface/runs/dpxdvnic)
+This model was trained with Reward.
 ### Framework versions

adapter_config.json CHANGED Viewed

@@ -27,13 +27,13 @@
   "rank_pattern": {},
   "revision": null,
   "target_modules": [
-    "v_proj",
     "k_proj",
-    "gate_proj",
     "q_proj",
-    "o_proj",
     "up_proj",
-    "down_proj"
   ],
   "task_type": "SEQ_CLS",
   "trainable_token_indices": null,

   "rank_pattern": {},
   "revision": null,
   "target_modules": [
     "k_proj",
+    "down_proj",
     "q_proj",
     "up_proj",
+    "v_proj",
+    "gate_proj",
+    "o_proj"
   ],
   "task_type": "SEQ_CLS",
   "trainable_token_indices": null,

adapter_model.safetensors CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:1d686a6349bb68a836cc9f9eca786754d585fac653413d3156d3ea1fd2056d5e
 size 80792456

 version https://git-lfs.github.com/spec/v1
+oid sha256:35cc1a110878f81fe9db665fd795a4c25982ea51d831bd681517bc5b803b3237
 size 80792456

training_args.bin CHANGED Viewed

@@ -1,3 +1,3 @@
 version https://git-lfs.github.com/spec/v1
-oid sha256:9cca9113dadb9b049b1d17c839631f75518b8727b5fdd818871e351583f14d2b
 size 6673

 version https://git-lfs.github.com/spec/v1
+oid sha256:b7f7a17803e21bc566350526f1b13e95231865723df358fae120beae44d969ef
 size 6673