Reward Rush: Deterministic Frozen Lake Q-Learning

This repository contains a tabular Q-learning agent for the FrozenLake-v1 environment with slippery physics disabled.

Model Architecture

The model uses a discrete Q-table with the following specifications:

State Space: 16 discrete states (4x4 grid)
Action Space: 4 discrete actions (Left, Down, Right, Up)
Format: Pickle-serialized NumPy array
Dimensions: (16, 4)

Common Implementation Mistakes to Avoid

Environment Physics: This agent is trained for the deterministic version. Ensure is_slippery=False is set in the environment constructor.
Path Optimization: In a deterministic environment, the agent should find the shortest path. Lower success rates usually indicate a mismatch in the "is_slippery" setting.
State Mapping: Ensure the grid size matches (4x4). Using this Q-table on an 8x8 grid will result in index out of bounds errors.

Download and Test Code

import gymnasium as gym
import numpy as np
import pickle
from huggingface_hub import hf_hub_download

def run_frozen_lake_deterministic_test():
    path = hf_hub_download(repo_id="Nharen/Reward_Rush_Q-learning_Deterministic_Frozen_Lake", filename="q-learning.pkl")
    
    with open(path, "rb") as f:
        q_table = pickle.load(f)

    env = gym.make("FrozenLake-v1", is_slippery=False)
    successes = 0
    num_episodes = 100

    for _ in range(num_episodes):
        state, _ = env.reset()
        done = False
        while not done:
            action = np.argmax(q_table[state])
            state, reward, terminated, truncated, _ = env.step(action)
            done = terminated or truncated
            if terminated and reward == 1.0:
                successes += 1

    print(f"Goal Reach Rate: {successes}/{num_episodes}")
    env.close()

if __name__ == "__main__":
    run_frozen_lake_deterministic_test()

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

Evaluation results

mean_reward on FrozenLake-v1
self-reported

1.000