Cap prompt generation at 512 tokens and add version print ee71a24 unverified Claude commited on 3 days ago
Add SFT warm start before GRPO and DB connectivity init check c2dc160 unverified Claude commited on 3 days ago
Make Supabase uploads incremental — upload after every step 76f180f unverified Claude commited on 3 days ago
Add Supabase upload for training results (Storage + DB) 28bcb40 unverified Claude commited on 3 days ago
Add raw training summary output and adjust training scale 71b0977 unverified Claude commited on 3 days ago
Add volume verification, fsync, and stdout fallback for training outputs f703ff1 unverified Claude commited on 3 days ago
Clean up dead code, unused imports, and move hardcoded values to config.yaml 3dc48b7 unverified Claude commited on 3 days ago
Add --llm-agent and other legacy CLI flags for backwards compatibility 03d9529 unverified Claude commited on 3 days ago
Centralize all training params in config.yaml (single source of truth) 4e2b74e unverified Claude commited on 3 days ago
Remove mock mode: only real GRPO RL training remains 288d9a2 unverified Claude commited on 3 days ago
Add clear training progress logging with technical + domain names 4b89b89 unverified Claude commited on 3 days ago
Update docstrings to reflect LLM-only training pipeline 01518e0 unverified Claude commited on 3 days ago
Align GRPOConfig defaults with CLI: 10 steps, 7 episodes ca36c02 unverified Claude commited on 3 days ago
Remove all rule-based fallback systems, require LLM inference 21da591 unverified Claude commited on 3 days ago
Reduce training defaults for fast iteration: steps=10, episodes=7 b1d7ca2 unverified Claude commited on 3 days ago
Add training report & logging system with reward charts and conversation comparisons 506d641 unverified Claude commited on 3 days ago
Fix critical gaps: prompt-sensitive agent, adversarial customers, executable GRPO, OpenEnv wrapper b259333 unverified Claude commited on 3 days ago