Spaces:

Kavya988
/

API_DEBUG_SOLVER

Sleeping

Upload 29 files

d416acc verified about 2 months ago

1.28 kB

	"""
	here we gonna define the reward function for our agent,
	so that it can learn or adapt the environment and
	able to get/achieve the rewards for the actions it takes in the environment.
	OR
	Per step reward
	"""
	# The rewarding system we writing here will be within the scale of -20 to +20.
	"""
	The factors we are using (5 factors):
	1. Correct action = positive reward (2 to 10)
	2. Wrong action = negative reward (-1 to -3)
	3. Resolve with FIX (Episode success) = large positive reward (+10 to +15)
	4. Resolve WITHOUT FIX (Prevents lying) = negative reward (-5 to -10)
	5. Max steps reached (Episode failure) = negative reward (-5)
	"""

	def calculate_reward(action, incident, fix_applied, step, max_steps):

	# agents says resolved but didn't fix - penalty
	if action == "resolve" and not fix_applied:
	return -10.0

	# agent ran out of steps - penalty
	if step >= max_steps:
	return -5.0

	# agent fixed and resolved the incident (succes)
	if action == "resolve" and fix_applied:
	return 15.0

	# for correct fix action
	if action == incident["fix_action"] and not fix_applied:
	return 5.0

	# Diagnostic actions - helpful but doesn't fix
	if action in ["inspect_logs", "inspect_request"]:
	return 0.5

	# for wrong action
	return -2.0