Reward Rush: Deterministic Frozen Lake Q-Learning
This repository contains a tabular Q-learning agent for the FrozenLake-v1 environment with slippery physics disabled.
Model Architecture
The model uses a discrete Q-table with the following specifications:
- State Space: 16 discrete states (4x4 grid)
- Action Space: 4 discrete actions (Left, Down, Right, Up)
- Format: Pickle-serialized NumPy array
- Dimensions: (16, 4)
Common Implementation Mistakes to Avoid
- Environment Physics: This agent is trained for the deterministic version. Ensure is_slippery=False is set in the environment constructor.
- Path Optimization: In a deterministic environment, the agent should find the shortest path. Lower success rates usually indicate a mismatch in the "is_slippery" setting.
- State Mapping: Ensure the grid size matches (4x4). Using this Q-table on an 8x8 grid will result in index out of bounds errors.
Download and Test Code
import gymnasium as gym
import numpy as np
import pickle
from huggingface_hub import hf_hub_download
def run_frozen_lake_deterministic_test():
path = hf_hub_download(repo_id="Nharen/Reward_Rush_Q-learning_Deterministic_Frozen_Lake", filename="q-learning.pkl")
with open(path, "rb") as f:
q_table = pickle.load(f)
env = gym.make("FrozenLake-v1", is_slippery=False)
successes = 0
num_episodes = 100
for _ in range(num_episodes):
state, _ = env.reset()
done = False
while not done:
action = np.argmax(q_table[state])
state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
if terminated and reward == 1.0:
successes += 1
print(f"Goal Reach Rate: {successes}/{num_episodes}")
env.close()
if __name__ == "__main__":
run_frozen_lake_deterministic_test()
Evaluation results
- mean_reward on FrozenLake-v1self-reported1.000