UKPLab
/

SciRM-7B

Text Generation

scientific-writing

reinforcement-learning

Model card Files Files and versions

fsahinuc commited on Jan 14

Commit

68f4726

·

verified ·

1 Parent(s): 9c4be9e

Create README.md

Files changed (1) hide show

README.md +17 -0

README.md ADDED Viewed

	@@ -0,0 +1,17 @@

+---
+base_model:
+- Qwen/Qwen2.5-7B-Instruct
+---
+unsloth GRPO LoRA training with ```final_reward_data.json```
+Parameters:
+  * ```context```: 4096
+  * ```maxgen```: 512
+  * ```batch```: 4
+  * ```rollout```: 4
+  * ```LoRA 16 bit rank```: 64
+  * ```gradient```: 4
+Further details at https://wandb.ai/furkansahinuc-personal/expert-reward-models/runs/m08y4h1i