tobil
/

qmd-training-scripts

tobil commited on Jan 24

Commit

4ffd0d6

verified ·

1 Parent(s): a91fb36

Upload train_grpo.py with huggingface_hub

Files changed (1) hide show

train_grpo.py CHANGED Viewed

@@ -247,8 +247,7 @@ def main():
         # GRPO specific
         num_generations=4,  # Generate 4 completions per prompt
-        max_new_tokens=256,
-        temperature=0.8,
         # Training
         num_train_epochs=args.epochs,

         # GRPO specific
         num_generations=4,  # Generate 4 completions per prompt
+        max_completion_length=256,
         # Training
         num_train_epochs=args.epochs,