Pre-format SFT dataset as text column, drop formatting_func 384df8f unverified Claude commited on Mar 8
Fix SFT: set completion_only_loss=False for formatting_func compat 44f7f8c unverified Claude commited on Mar 8
Fix SFT warm start: add formatting_func for Unsloth SFTTrainer b8e7dcd unverified Claude commited on Mar 8
Cap prompt generation at 512 tokens and add version print ee71a24 unverified Claude commited on Mar 8
Add SFT warm start before GRPO and DB connectivity init check c2dc160 unverified Claude commited on Mar 8
Make Supabase uploads incremental — upload after every step 76f180f unverified Claude commited on Mar 8
Add raw training summary output and adjust training scale 71b0977 unverified Claude commited on Mar 8
Improve reward function to break refuse-everything local minimum and scale training bd8220a unverified Claude commited on Mar 8
Add volume verification, fsync, and stdout fallback for training outputs f703ff1 unverified Claude commited on Mar 8
Clean up dead code, unused imports, and move hardcoded values to config.yaml 3dc48b7 unverified Claude commited on Mar 8
Add --llm-agent and other legacy CLI flags for backwards compatibility 03d9529 unverified Claude commited on Mar 8
Centralize all training params in config.yaml (single source of truth) 4e2b74e unverified Claude commited on Mar 8
Add clear training progress logging with technical + domain names 4b89b89 unverified Claude commited on Mar 8
Remove all rule-based fallback systems, require LLM inference 21da591 unverified Claude commited on Mar 8
Fix hardcoded values in report: dynamic customer count and eval episode labels 7ed3d6b unverified Claude commited on Mar 8
Reduce training defaults for fast iteration: steps=10, episodes=7 b1d7ca2 unverified Claude commited on Mar 8
Add training report & logging system with reward charts and conversation comparisons 506d641 unverified Claude commited on Mar 8
Fix critical gaps: prompt-sensitive agent, adversarial customers, executable GRPO, OpenEnv wrapper b259333 unverified Claude commited on Mar 8
Implement self-improving AI oversight system with nested RL environments e6b0e2f unverified Claude commited on Mar 8