File size: 1,282 Bytes
d416acc
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
"""
here we gonna define the reward function for our agent,
so that it can learn or adapt the environment and 
able to get/achieve the rewards for the actions it takes in the environment.
OR 
Per step reward
"""
# The rewarding system we writing here will be within the scale of -20 to +20.
"""
The factors we are using (5 factors):
1. Correct action = positive reward (2 to 10)
2. Wrong action = negative reward (-1 to -3)
3. Resolve with FIX (Episode success) = large positive reward (+10 to +15)
4. Resolve WITHOUT FIX (Prevents lying) = negative reward (-5 to -10)
5. Max steps reached (Episode failure) = negative reward (-5)
"""

def calculate_reward(action, incident, fix_applied, step, max_steps):
  
  # agents says resolved but didn't fix - penalty 
  if action == "resolve" and not fix_applied:
    return -10.0 
  
  # agent ran out of steps - penalty
  if step >= max_steps:
    return -5.0
  
  # agent fixed and resolved the incident (succes)
  if action == "resolve" and fix_applied:
    return 15.0
  
  # for correct fix action
  if action == incident["fix_action"] and not fix_applied:
    return 5.0
  
  # Diagnostic actions - helpful but doesn't fix
  if action in ["inspect_logs", "inspect_request"]:
    return 0.5
  
  # for wrong action
  return -2.0