Nharen
/

Reward_Rush_Q-learning_Frozen_lake_Deterministic

Reinforcement Learning

Eval Results (legacy)

Model card Files Files and versions

Nharen commited on Dec 31, 2025

Commit

f34fc78

·

verified ·

1 Parent(s): ec5666a

Update README.md

Files changed (1) hide show

README.md +52 -0

README.md CHANGED Viewed

@@ -18,3 +18,55 @@ model-index:
       value: 1.0
 ---

       value: 1.0
 ---
+# Reward Rush: Deterministic Frozen Lake Q-Learning
+This repository contains a tabular Q-learning agent for the FrozenLake-v1 environment with slippery physics disabled.
+## Model Architecture
+The model uses a discrete Q-table with the following specifications:
+* State Space: 16 discrete states (4x4 grid)
+* Action Space: 4 discrete actions (Left, Down, Right, Up)
+* Format: Pickle-serialized NumPy array
+* Dimensions: (16, 4)
+## Common Implementation Mistakes to Avoid
+1. Environment Physics: This agent is trained for the deterministic version. Ensure is_slippery=False is set in the environment constructor.
+2. Path Optimization: In a deterministic environment, the agent should find the shortest path. Lower success rates usually indicate a mismatch in the "is_slippery" setting.
+3. State Mapping: Ensure the grid size matches (4x4). Using this Q-table on an 8x8 grid will result in index out of bounds errors.
+## Download and Test Code
+```python
+import gymnasium as gym
+import numpy as np
+import pickle
+from huggingface_hub import hf_hub_download
+def run_frozen_lake_deterministic_test():
+    path = hf_hub_download(repo_id="Nharen/Reward_Rush_Q-learning_Deterministic_Frozen_Lake", filename="q-learning.pkl")
+    with open(path, "rb") as f:
+        q_table = pickle.load(f)
+    env = gym.make("FrozenLake-v1", is_slippery=False)
+    successes = 0
+    num_episodes = 100
+    for _ in range(num_episodes):
+        state, _ = env.reset()
+        done = False
+        while not done:
+            action = np.argmax(q_table[state])
+            state, reward, terminated, truncated, _ = env.step(action)
+            done = terminated or truncated
+            if terminated and reward == 1.0:
+                successes += 1
+    print(f"Goal Reach Rate: {successes}/{num_episodes}")
+    env.close()
+if __name__ == "__main__":
+    run_frozen_lake_deterministic_test()
+```