Nharen commited on
Commit
f34fc78
·
verified ·
1 Parent(s): ec5666a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +52 -0
README.md CHANGED
@@ -18,3 +18,55 @@ model-index:
18
  value: 1.0
19
  ---
20
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  value: 1.0
19
  ---
20
 
21
+ # Reward Rush: Deterministic Frozen Lake Q-Learning
22
+
23
+ This repository contains a tabular Q-learning agent for the FrozenLake-v1 environment with slippery physics disabled.
24
+
25
+ ## Model Architecture
26
+
27
+ The model uses a discrete Q-table with the following specifications:
28
+ * State Space: 16 discrete states (4x4 grid)
29
+ * Action Space: 4 discrete actions (Left, Down, Right, Up)
30
+ * Format: Pickle-serialized NumPy array
31
+ * Dimensions: (16, 4)
32
+
33
+ ## Common Implementation Mistakes to Avoid
34
+
35
+ 1. Environment Physics: This agent is trained for the deterministic version. Ensure is_slippery=False is set in the environment constructor.
36
+ 2. Path Optimization: In a deterministic environment, the agent should find the shortest path. Lower success rates usually indicate a mismatch in the "is_slippery" setting.
37
+ 3. State Mapping: Ensure the grid size matches (4x4). Using this Q-table on an 8x8 grid will result in index out of bounds errors.
38
+
39
+ ## Download and Test Code
40
+
41
+ ```python
42
+ import gymnasium as gym
43
+ import numpy as np
44
+ import pickle
45
+ from huggingface_hub import hf_hub_download
46
+
47
+ def run_frozen_lake_deterministic_test():
48
+ path = hf_hub_download(repo_id="Nharen/Reward_Rush_Q-learning_Deterministic_Frozen_Lake", filename="q-learning.pkl")
49
+
50
+ with open(path, "rb") as f:
51
+ q_table = pickle.load(f)
52
+
53
+ env = gym.make("FrozenLake-v1", is_slippery=False)
54
+ successes = 0
55
+ num_episodes = 100
56
+
57
+ for _ in range(num_episodes):
58
+ state, _ = env.reset()
59
+ done = False
60
+ while not done:
61
+ action = np.argmax(q_table[state])
62
+ state, reward, terminated, truncated, _ = env.step(action)
63
+ done = terminated or truncated
64
+ if terminated and reward == 1.0:
65
+ successes += 1
66
+
67
+ print(f"Goal Reach Rate: {successes}/{num_episodes}")
68
+ env.close()
69
+
70
+ if __name__ == "__main__":
71
+ run_frozen_lake_deterministic_test()
72
+ ```