Claude
Implement self-improving AI oversight system with nested RL environments
e6b0e2f unverified
raw
history blame contribute delete
64 Bytes
"""Layer 1 — RL Prompt Optimizer (GRPO via TRL + Unsloth)."""