Kavya988's picture
Upload 29 files
d416acc verified
"""
here we gonna define the reward function for our agent,
so that it can learn or adapt the environment and
able to get/achieve the rewards for the actions it takes in the environment.
OR
Per step reward
"""
# The rewarding system we writing here will be within the scale of -20 to +20.
"""
The factors we are using (5 factors):
1. Correct action = positive reward (2 to 10)
2. Wrong action = negative reward (-1 to -3)
3. Resolve with FIX (Episode success) = large positive reward (+10 to +15)
4. Resolve WITHOUT FIX (Prevents lying) = negative reward (-5 to -10)
5. Max steps reached (Episode failure) = negative reward (-5)
"""
def calculate_reward(action, incident, fix_applied, step, max_steps):
# agents says resolved but didn't fix - penalty
if action == "resolve" and not fix_applied:
return -10.0
# agent ran out of steps - penalty
if step >= max_steps:
return -5.0
# agent fixed and resolved the incident (succes)
if action == "resolve" and fix_applied:
return 15.0
# for correct fix action
if action == incident["fix_action"] and not fix_applied:
return 5.0
# Diagnostic actions - helpful but doesn't fix
if action in ["inspect_logs", "inspect_request"]:
return 0.5
# for wrong action
return -2.0