Reward Rush: Stochastic Frozen Lake Q-Learning

This repository contains a tabular Q-learning agent for the FrozenLake-v1 environment with slippery physics enabled.

Model Architecture

The model uses a discrete Q-table with the following specifications:

State Space: 16 discrete states (4x4 grid)
Action Space: 4 discrete actions (Left, Down, Right, Up)
Format: Pickle-serialized NumPy array
Dimensions: (16, 4)

Common Implementation Mistakes to Avoid

Environment Physics: This agent is trained for the stochastic version. Ensure is_slippery=True is set in the environment constructor.
Success Definition: A reward is only granted upon reaching the goal (G). Falling into a hole (H) results in a reward of 0 and termination.
Q-Table Format: Ensure the loading environment uses a compatible version of NumPy to avoid pickle protocol errors.

Download and Test Code

import gymnasium as gym
import numpy as np
import pickle
from huggingface_hub import hf_hub_download

def run_frozen_lake_stochastic_test():
    path = hf_hub_download(repo_id="Nharen/Reward_Rush_Q-learning_Stochastic_Frozen_Lake", filename="q-learning.pkl")
    
    with open(path, "rb") as f:
        q_table = pickle.load(f)

    env = gym.make("FrozenLake-v1", is_slippery=True)
    successes = 0
    num_episodes = 100

    for _ in range(num_episodes):
        state, _ = env.reset()
        done = False
        while not done:
            action = np.argmax(q_table[state])
            state, reward, terminated, truncated, _ = env.step(action)
            done = terminated or truncated
            if terminated and reward == 1.0:
                successes += 1

    print(f"Goal Reach Rate: {successes}/{num_episodes}")
    env.close()

if __name__ == "__main__":
    run_frozen_lake_stochastic_test()

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on FrozenLake-v1
self-reported

0.750
n_evaluation_episodes on FrozenLake-v1
self-reported

100.000