Sentinel / train.py

Commit History

Fix critical RL reward function exploits and training hyperparameters
803c93e

nihalaninihal Claude Opus 4.6 commited on

Align with Advanced Llama 3.2 GRPO LoRA reference notebook pattern
c7d253a

nihalaninihal Claude Opus 4.6 commited on

Fix VALID_TARGETS_FOR_ATTACK and attacker heuristic/prompt inconsistencies
3ffb78a

nihalaninihal Claude Opus 4.6 commited on

Align train.py and Colab notebook with official Unsloth+OpenEnv GRPO patterns
e09a415

nihalaninihal Claude Opus 4.6 commited on

Add multi-agent GRPO training for all 3 agents (worker, attacker, oversight)
389e3bf

nihalaninihal Claude Opus 4.6 commited on

Remove hackathon_env template, rewrite train.py for SentinelOpsArena
0e5a0a6

nihalaninihal Claude Opus 4.6 commited on

Initial project setup for OpenEnv Hackathon
ccb5f4e

nihalaninihal Claude Opus 4.6 commited on