Increase training scale: more steps, episodes, and SFT epochs b1685a6 unverified Claude commited on 2 days ago
Cap prompt generation at 512 tokens and add version print ee71a24 unverified Claude commited on 2 days ago
Add SFT warm start before GRPO and DB connectivity init check c2dc160 unverified Claude commited on 2 days ago
Add Supabase upload for training results (Storage + DB) 28bcb40 unverified Claude commited on 3 days ago
Add raw training summary output and adjust training scale 71b0977 unverified Claude commited on 3 days ago
Improve reward function to break refuse-everything local minimum and scale training bd8220a unverified Claude commited on 3 days ago
Update output paths to use persistent volume at /workspace/output 46bfd81 unverified Claude commited on 3 days ago
Clean up dead code, unused imports, and move hardcoded values to config.yaml 3dc48b7 unverified Claude commited on 3 days ago
Reduce GRPO training params to minimum: 2 candidates, 5 steps, 5 episodes 31b8286 unverified Claude commited on 3 days ago
Centralize all training params in config.yaml (single source of truth) 4e2b74e unverified Claude commited on 3 days ago