RL env to catch reward-hacking agents. F1=0.970.
Simulate outbreak investigations to find pathogen, source, and spread