Reward Rush: Stochastic Frozen Lake Q-Learning
This repository contains a tabular Q-learning agent for the FrozenLake-v1 environment with slippery physics enabled.
Model Architecture
The model uses a discrete Q-table with the following specifications:
- State Space: 16 discrete states (4x4 grid)
- Action Space: 4 discrete actions (Left, Down, Right, Up)
- Format: Pickle-serialized NumPy array
- Dimensions: (16, 4)
Common Implementation Mistakes to Avoid
- Environment Physics: This agent is trained for the stochastic version. Ensure is_slippery=True is set in the environment constructor.
- Success Definition: A reward is only granted upon reaching the goal (G). Falling into a hole (H) results in a reward of 0 and termination.
- Q-Table Format: Ensure the loading environment uses a compatible version of NumPy to avoid pickle protocol errors.
Download and Test Code
import gymnasium as gym
import numpy as np
import pickle
from huggingface_hub import hf_hub_download
def run_frozen_lake_stochastic_test():
path = hf_hub_download(repo_id="Nharen/Reward_Rush_Q-learning_Stochastic_Frozen_Lake", filename="q-learning.pkl")
with open(path, "rb") as f:
q_table = pickle.load(f)
env = gym.make("FrozenLake-v1", is_slippery=True)
successes = 0
num_episodes = 100
for _ in range(num_episodes):
state, _ = env.reset()
done = False
while not done:
action = np.argmax(q_table[state])
state, reward, terminated, truncated, _ = env.step(action)
done = terminated or truncated
if terminated and reward == 1.0:
successes += 1
print(f"Goal Reach Rate: {successes}/{num_episodes}")
env.close()
if __name__ == "__main__":
run_frozen_lake_stochastic_test()
Evaluation results
- mean_reward on FrozenLake-v1self-reported0.750
- n_evaluation_episodes on FrozenLake-v1self-reported100.000