Mini RL Game — DQN (Vector + Pixels)

A simple Pygame environment with a DQN agent that learns two scenarios. It is a Educational RL example. Quick experimentation with DQN on a minimal game.

Eat: Catch falling objects.
Avoid: Dodge falling objects as long as possible.

Check the project from the GitHub Link. You can download the models here.

Observation Types

Vector (MLP): Compact state per enemy with normalized deltas.
Pixels (CNN): Raw frames (84×84 grayscale) stacked over 4 frames.

✅ Checkpoints

Vector (MLP)

Scenario	Episodes	Enemies	File
Eat	1000	4	`model_vector_eat.h5`
Avoid	3000	8	`model_vector_avoid.h5`

Pixels (CNN)

Scenario	Episodes	Enemies	File
Eat	1000	4	`model_pixels_eat.h5`
Avoid	3000	8	`model_pixels_avoid.h5`

🧠 Model Architecture

Vector (MLP) DQN

Input: 2 * N_enemies features (per enemy: Δx/width, Δy/height).
Network:
Dense(128, relu) → Dense(128, relu) → Dense(3, linear)

Pixels (CNN) DQN

Input: (84, 84, 4) stacked grayscale frames.
Network:
Conv(32, 8×8, s=4, relu) → Conv(64, 4×4, s=2, relu) → Conv(64, 3×3, s=1, relu) → Dense(512, relu) → Dense(3, linear)

⚙️ Training Setup

Algorithm: DQN with target network
Loss: Huber
Optimizer: Adam (lr=1e-3 for MLP, lr=2.5e-4 for CNN)
Target Updates: Soft update with τ=0.005

Replay

Buffer size: 50k (MLP) / 100k (CNN)
Warm-up (train_start): 2000 (MLP) / 5000 (CNN)
Updates per env step: 2
Batch size: 64–128 (typical)

Exploration

Linear epsilon decay per episode: 1.0 → 0.05 over ~750 episodes.

Rewards (scaled small for stability)

Eat: step −0.01, catch +1.0, miss −1.0
Avoid: survival +0.001 per step, near-miss up to −0.25, collision −5.0

Environment

Pygame; player moves along bottom; multiple falling enemies.
Dependencies
- Python 3.8
- TensorFlow 2.x (e.g., 2.9)
- NumPy
- scikit-image (for pixels preprocessing)
- Pygame

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning