Mini RL Game — DQN (Vector + Pixels)
A simple Pygame environment with a DQN agent that learns two scenarios. It is a Educational RL example. Quick experimentation with DQN on a minimal game.
- Eat: Catch falling objects.
- Avoid: Dodge falling objects as long as possible.
Check the project from the GitHub Link. You can download the models here.
Observation Types
- Vector (MLP): Compact state per enemy with normalized deltas.
- Pixels (CNN): Raw frames (84×84 grayscale) stacked over 4 frames.
✅ Checkpoints
Vector (MLP)
| Scenario | Episodes | Enemies | File |
|---|---|---|---|
| Eat | 1000 | 4 | model_vector_eat.h5 |
| Avoid | 3000 | 8 | model_vector_avoid.h5 |
Pixels (CNN)
| Scenario | Episodes | Enemies | File |
|---|---|---|---|
| Eat | 1000 | 4 | model_pixels_eat.h5 |
| Avoid | 3000 | 8 | model_pixels_avoid.h5 |
🧠 Model Architecture
Vector (MLP) DQN
- Input:
2 * N_enemiesfeatures (per enemy: Δx/width, Δy/height). - Network:
Dense(128, relu) → Dense(128, relu) → Dense(3, linear)
Pixels (CNN) DQN
- Input:
(84, 84, 4)stacked grayscale frames. - Network:
Conv(32, 8×8, s=4, relu) → Conv(64, 4×4, s=2, relu) → Conv(64, 3×3, s=1, relu) → Dense(512, relu) → Dense(3, linear)
⚙️ Training Setup
Algorithm: DQN with target network
Loss: Huber
Optimizer: Adam (lr=1e-3 for MLP, lr=2.5e-4 for CNN)
Target Updates: Soft update with τ=0.005
Replay
- Buffer size:
50k (MLP)/100k (CNN) - Warm-up (
train_start):2000 (MLP)/5000 (CNN) - Updates per env step:
2 - Batch size:
64–128(typical)
Exploration
- Linear epsilon decay per episode:
1.0 → 0.05over ~750 episodes.
Rewards (scaled small for stability)
- Eat:
step −0.01,catch +1.0,miss −1.0 - Avoid:
survival +0.001per step,near-miss up to −0.25,collision −5.0
Environment
- Pygame; player moves along bottom; multiple falling enemies.
- Dependencies
- Python 3.8
- TensorFlow 2.x (e.g., 2.9)
- NumPy
- scikit-image (for pixels preprocessing)
- Pygame