fsahinuc commited on
Commit
68f4726
·
verified ·
1 Parent(s): 9c4be9e

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -0
README.md ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-7B-Instruct
4
+ ---
5
+
6
+ unsloth GRPO LoRA training with ```final_reward_data.json```
7
+
8
+ Parameters:
9
+
10
+ * ```context```: 4096
11
+ * ```maxgen```: 512
12
+ * ```batch```: 4
13
+ * ```rollout```: 4
14
+ * ```LoRA 16 bit rank```: 64
15
+ * ```gradient```: 4
16
+
17
+ Further details at https://wandb.ai/furkansahinuc-personal/expert-reward-models/runs/m08y4h1i