Fix critical RL reward function exploits and training hyperparameters 803c93e nihalaninihal Claude Opus 4.6 commited on 3 days ago
Align with Advanced Llama 3.2 GRPO LoRA reference notebook pattern c7d253a nihalaninihal Claude Opus 4.6 commited on 3 days ago
Fix VALID_TARGETS_FOR_ATTACK and attacker heuristic/prompt inconsistencies 3ffb78a nihalaninihal Claude Opus 4.6 commited on 3 days ago
Align train.py and Colab notebook with official Unsloth+OpenEnv GRPO patterns e09a415 nihalaninihal Claude Opus 4.6 commited on 3 days ago
Add multi-agent GRPO training for all 3 agents (worker, attacker, oversight) 389e3bf nihalaninihal Claude Opus 4.6 commited on 3 days ago
Remove hackathon_env template, rewrite train.py for SentinelOpsArena 0e5a0a6 nihalaninihal Claude Opus 4.6 commited on 3 days ago
Initial project setup for OpenEnv Hackathon ccb5f4e nihalaninihal Claude Opus 4.6 commited on 4 days ago