security-auditor-grpo / train_grpo_v2.py

Commit History

v2: 5K subset for A10G, fix escaping
3c818d7
verified

oxdev commited on

fix: escape syntax in quality_reward
75be256
verified

oxdev commited on

add: GRPO v2 training script with 4 reward functions + dataset builder
9ab390c
verified

oxdev commited on