Update README.md
Browse files
README.md
CHANGED
|
@@ -18,3 +18,55 @@ model-index:
|
|
| 18 |
value: 1.0
|
| 19 |
---
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
value: 1.0
|
| 19 |
---
|
| 20 |
|
| 21 |
+
# Reward Rush: Deterministic Frozen Lake Q-Learning
|
| 22 |
+
|
| 23 |
+
This repository contains a tabular Q-learning agent for the FrozenLake-v1 environment with slippery physics disabled.
|
| 24 |
+
|
| 25 |
+
## Model Architecture
|
| 26 |
+
|
| 27 |
+
The model uses a discrete Q-table with the following specifications:
|
| 28 |
+
* State Space: 16 discrete states (4x4 grid)
|
| 29 |
+
* Action Space: 4 discrete actions (Left, Down, Right, Up)
|
| 30 |
+
* Format: Pickle-serialized NumPy array
|
| 31 |
+
* Dimensions: (16, 4)
|
| 32 |
+
|
| 33 |
+
## Common Implementation Mistakes to Avoid
|
| 34 |
+
|
| 35 |
+
1. Environment Physics: This agent is trained for the deterministic version. Ensure is_slippery=False is set in the environment constructor.
|
| 36 |
+
2. Path Optimization: In a deterministic environment, the agent should find the shortest path. Lower success rates usually indicate a mismatch in the "is_slippery" setting.
|
| 37 |
+
3. State Mapping: Ensure the grid size matches (4x4). Using this Q-table on an 8x8 grid will result in index out of bounds errors.
|
| 38 |
+
|
| 39 |
+
## Download and Test Code
|
| 40 |
+
|
| 41 |
+
```python
|
| 42 |
+
import gymnasium as gym
|
| 43 |
+
import numpy as np
|
| 44 |
+
import pickle
|
| 45 |
+
from huggingface_hub import hf_hub_download
|
| 46 |
+
|
| 47 |
+
def run_frozen_lake_deterministic_test():
|
| 48 |
+
path = hf_hub_download(repo_id="Nharen/Reward_Rush_Q-learning_Deterministic_Frozen_Lake", filename="q-learning.pkl")
|
| 49 |
+
|
| 50 |
+
with open(path, "rb") as f:
|
| 51 |
+
q_table = pickle.load(f)
|
| 52 |
+
|
| 53 |
+
env = gym.make("FrozenLake-v1", is_slippery=False)
|
| 54 |
+
successes = 0
|
| 55 |
+
num_episodes = 100
|
| 56 |
+
|
| 57 |
+
for _ in range(num_episodes):
|
| 58 |
+
state, _ = env.reset()
|
| 59 |
+
done = False
|
| 60 |
+
while not done:
|
| 61 |
+
action = np.argmax(q_table[state])
|
| 62 |
+
state, reward, terminated, truncated, _ = env.step(action)
|
| 63 |
+
done = terminated or truncated
|
| 64 |
+
if terminated and reward == 1.0:
|
| 65 |
+
successes += 1
|
| 66 |
+
|
| 67 |
+
print(f"Goal Reach Rate: {successes}/{num_episodes}")
|
| 68 |
+
env.close()
|
| 69 |
+
|
| 70 |
+
if __name__ == "__main__":
|
| 71 |
+
run_frozen_lake_deterministic_test()
|
| 72 |
+
```
|