Commit History

Improve reward function to break refuse-everything local minimum and scale training
bd8220a
unverified

Claude commited on

Clean up dead code, unused imports, and move hardcoded values to config.yaml
3dc48b7
unverified

Claude commited on

Remove all rule-based fallback systems, require LLM inference
21da591
unverified

Claude commited on

Fix critical gaps: prompt-sensitive agent, adversarial customers, executable GRPO, OpenEnv wrapper
b259333
unverified

Claude commited on

Implement self-improving AI oversight system with nested RL environments
e6b0e2f
unverified

Claude commited on