|
|
--- |
|
|
license: mit |
|
|
--- |
|
|
Here's a comprehensive Hugging Face Model Card for your PyQt5 Dueling DQN Mario Tutorial: |
|
|
|
|
|
```markdown |
|
|
--- |
|
|
library_name: pytorch |
|
|
tags: |
|
|
- reinforcement-learning |
|
|
- dueling-dqn |
|
|
- super-mario-bros |
|
|
- pytorch |
|
|
- pyqt5 |
|
|
- tutorial |
|
|
- educational |
|
|
- interactive-learning |
|
|
--- |
|
|
|
|
|
# PyQt5 Dueling DQN Mario Tutorial - Interactive Learning Application |
|
|
|
|
|
## Model Overview |
|
|
|
|
|
An interactive PyQt5 desktop application that provides a comprehensive tutorial for implementing Dueling Deep Q-Networks to play Super Mario Bros. This educational tool combines theoretical explanations with hands-on coding exercises to teach reinforcement learning concepts. |
|
|
|
|
|
|
|
|
 |
|
|
|
|
|
## ๐ฏ What is this? |
|
|
|
|
|
This is not a traditional ML model, but an **interactive educational application** built with PyQt5 that teaches you how to implement Dueling DQN from scratch. It's designed for learners who want to understand reinforcement learning through practical implementation. |
|
|
|
|
|
## โจ Features |
|
|
|
|
|
- **Interactive Tutorial Interface**: Beautiful PyQt5 GUI with navigation and progress tracking |
|
|
- **Comprehensive Theory**: Detailed explanations of Dueling DQN architecture and mathematics |
|
|
- **Hands-on Exercises**: 8 coding exercises covering all implementation aspects |
|
|
- **Progress Tracking**: Visual progress indicators and completion metrics |
|
|
- **Code Validation**: Interactive code execution and solution checking |
|
|
- **Visual Learning**: Architecture diagrams and training visualizations |
|
|
|
|
|
## ๐๏ธ Architecture |
|
|
|
|
|
### Dueling DQN Components Covered: |
|
|
|
|
|
1. **Environment Setup** - Super Mario Bros environment with preprocessing |
|
|
2. **Replay Memory** - Experience replay buffer implementation |
|
|
3. **Neural Network** - Dueling architecture with separate value/advantage streams |
|
|
4. **Training Algorithm** - DQN with target networks and epsilon-greedy exploration |
|
|
5. **Reward Shaping** - Advanced reward transformation techniques |
|
|
6. **Model Persistence** - Checkpoint saving and loading |
|
|
7. **Hyperparameter Tuning** - Configuration management system |
|
|
8. **Evaluation Metrics** - Comprehensive training analysis |
|
|
|
|
|
### Network Architecture: |
|
|
```python |
|
|
DuelingDQN( |
|
|
(conv1): Conv2d(4, 32, kernel_size=8, stride=4) |
|
|
(conv2): Conv2d(32, 64, kernel_size=3, stride=1) |
|
|
(fc_adv): Linear(20736, 512) # Advantage stream |
|
|
(fc_val): Linear(20736, 512) # Value stream |
|
|
(advantage): Linear(512, n_actions) |
|
|
(value): Linear(512, 1) |
|
|
) |
|
|
``` |
|
|
|
|
|
## ๐ Quick Start |
|
|
|
|
|
### Installation |
|
|
|
|
|
```bash |
|
|
# Clone the repository |
|
|
git clone https://github.com/TroglodyteDerivations/dueling-dqn-mario-tutorial.git |
|
|
cd dueling-dqn-mario-tutorial |
|
|
|
|
|
# Install dependencies |
|
|
pip install -r requirements.txt |
|
|
|
|
|
# Run the application |
|
|
python duel_dqn_tutorial.py |
|
|
``` |
|
|
|
|
|
### Requirements |
|
|
|
|
|
```txt |
|
|
torch>=1.9.0 |
|
|
gym-super-mario-bros>=7.3.0 |
|
|
nes-py>=8.1.0 |
|
|
PyQt5>=5.15.0 |
|
|
numpy>=1.21.0 |
|
|
opencv-python>=4.5.0 |
|
|
matplotlib>=3.5.0 |
|
|
``` |
|
|
|
|
|
## ๐ Tutorial Structure |
|
|
|
|
|
### 8 Comprehensive Sections: |
|
|
|
|
|
1. **Introduction** - Overview and setup |
|
|
2. **Dueling DQN Theory** - Mathematical foundations |
|
|
3. **Environment Setup** - Super Mario Bros configuration |
|
|
4. **Replay Memory** - Experience buffer implementation |
|
|
5. **Neural Network** - Dueling architecture build |
|
|
6. **Training Algorithm** - DQN training loop |
|
|
7. **Complete Implementation** - Full system integration |
|
|
8. **Exercises** - Hands-on coding challenges |
|
|
|
|
|
### 8 Interactive Exercises: |
|
|
|
|
|
1. Replay Memory Implementation |
|
|
2. Dueling DQN Model Architecture |
|
|
3. Environment Wrapper |
|
|
4. Training Loop with Epsilon-Greedy |
|
|
5. Reward Shaping Functions |
|
|
6. Model Saving/Loading System |
|
|
7. Hyperparameter Configuration |
|
|
8. Evaluation Metrics System |
|
|
|
|
|
## ๐ฎ Environment Details |
|
|
|
|
|
**Game**: Super Mario Bros (NES) |
|
|
**Action Space**: 12 complex movements |
|
|
**Observation**: 4 stacked frames (84x84 grayscale) |
|
|
**Reward Structure**: Distance, coins, enemies, level completion |
|
|
|
|
|
### Action Space (COMPLEX_MOVEMENT): |
|
|
```python |
|
|
['NOOP', 'RIGHT', 'RIGHT+A', 'RIGHT+B', 'RIGHT+A+B', |
|
|
'A', 'LEFT', 'LEFT+A', 'LEFT+B', 'LEFT+A+B', |
|
|
'DOWN', 'UP'] |
|
|
``` |
|
|
|
|
|
## ๐ง Dueling DQN Theory |
|
|
|
|
|
### Key Innovation: |
|
|
```python |
|
|
Q(s,a) = V(s) + A(s,a) - mean(A(s,ยท)) |
|
|
``` |
|
|
|
|
|
**Benefits over Standard DQN**: |
|
|
- Better action generalization |
|
|
- More stable learning |
|
|
- Faster convergence |
|
|
- Separate state value and action advantage learning |
|
|
|
|
|
## โ๏ธ Training Configuration |
|
|
|
|
|
```python |
|
|
# Default Hyperparameters |
|
|
learning_rate = 0.0001 |
|
|
gamma = 0.99 |
|
|
batch_size = 32 |
|
|
buffer_size = 10000 |
|
|
epsilon_start = 1.0 |
|
|
epsilon_end = 0.01 |
|
|
epsilon_decay = 0.995 |
|
|
target_update = 1000 |
|
|
``` |
|
|
|
|
|
## ๐ Performance |
|
|
|
|
|
### Expected Learning Progress: |
|
|
- **Episodes 0-1000**: Basic movement learning |
|
|
- **Episodes 1000-5000**: Enemy avoidance and coin collection |
|
|
- **Episodes 5000+**: Level navigation and completion |
|
|
|
|
|
### Sample Training Output: |
|
|
``` |
|
|
cuda | Episode: 100 | Score: 256.8 | Loss: 1.23 | Stage: 1-1 |
|
|
cuda | Episode: 500 | Score: 512.1 | Loss: 0.87 | Stage: 1-2 |
|
|
cuda | Episode: 1000 | Score: 890.4 | Loss: 0.45 | Stage: 2-1 |
|
|
``` |
|
|
|
|
|
## ๐ ๏ธ Usage Examples |
|
|
|
|
|
### Running the Tutorial: |
|
|
```python |
|
|
from duel_dqn_tutorial import DuelingDQNTutorialApp |
|
|
import sys |
|
|
from PyQt5.QtWidgets import QApplication |
|
|
|
|
|
app = QApplication(sys.argv) |
|
|
window = DuelingDQNTutorialApp() |
|
|
window.show() |
|
|
sys.exit(app.exec_()) |
|
|
``` |
|
|
|
|
|
### Training a Model: |
|
|
```python |
|
|
from mario_dqn import MarioDQNAgent |
|
|
|
|
|
agent = MarioDQNAgent() |
|
|
scores = agent.train(episodes=10000) |
|
|
agent.save_model('mario_dqn_final.pth') |
|
|
``` |
|
|
|
|
|
## ๐ฏ Educational Value |
|
|
|
|
|
This tutorial helps you understand: |
|
|
|
|
|
- **Reinforcement Learning Fundamentals**: MDP, Q-learning, policy optimization |
|
|
- **Deep Q-Networks**: Value approximation with neural networks |
|
|
- **Dueling Architecture**: Value/advantage decomposition theory |
|
|
- **Experience Replay**: Importance of uncorrelated training samples |
|
|
- **Target Networks**: Stabilizing training with delayed updates |
|
|
- **Reward Engineering**: Shaping rewards for better learning |
|
|
- **Hyperparameter Tuning**: Systematic configuration optimization |
|
|
|
|
|
## ๐ Project Structure |
|
|
|
|
|
``` |
|
|
dueling-dqn-mario-tutorial/ |
|
|
โโโ duel_dqn_tutorial.py # Main PyQt5 application |
|
|
โโโ mario_dqn.py # DQN implementation |
|
|
โโโ wrappers.py # Environment wrappers |
|
|
โโโ models/ # Saved model checkpoints |
|
|
โโโ exercises/ # Exercise solutions |
|
|
โโโ requirements.txt # Dependencies |
|
|
โโโ README.md # This file |
|
|
``` |
|
|
|
|
|
## ๐ค Contributing |
|
|
|
|
|
We welcome contributions! Areas for improvement: |
|
|
|
|
|
- Additional exercise variations |
|
|
- More visualization tools |
|
|
- Performance optimizations |
|
|
- Additional game environments |
|
|
- Multi-agent implementations |
|
|
|
|
|
## ๐ Citation |
|
|
|
|
|
If you use this tutorial in your research or teaching, please cite: |
|
|
|
|
|
```bibtex |
|
|
@software{dueling_dqn_mario_tutorial, |
|
|
title = {PyQt5 Dueling DQN Mario Tutorial}, |
|
|
author = {Martin Rivera}, |
|
|
year = {2025}, |
|
|
url = {https://huggingface.co/TroglodyteDerivations/Interactive_Dueling_DQN_Mario_Tutorial/edit/main/README.md} |
|
|
} |
|
|
``` |
|
|
|
|
|
## ๐ License |
|
|
|
|
|
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. |
|
|
|
|
|
## ๐ Acknowledgments |
|
|
|
|
|
- Nintendo for Super Mario Bros |
|
|
- OpenAI Gym for the reinforcement learning framework |
|
|
- PyTorch team for the deep learning framework |
|
|
- PyQt5 team for the GUI framework |
|
|
- Flux.1-krea.dev for architecture visualizations |
|
|
|
|
|
--- |
|
|
|
|
|
**Happy Learning!** ๐ฎโจ |
|
|
|
|
|
*Master reinforcement learning by building an AI that can play Super Mario Bros!* |
|
|
``` |
|
|
|
|
|
## Additional Files for Your Repository: |
|
|
|
|
|
### requirements.txt |
|
|
```txt |
|
|
torch>=1.9.0 |
|
|
gym-super-mario-bros>=7.3.0 |
|
|
nes-py>=8.1.0 |
|
|
PyQt5>=5.15.0 |
|
|
numpy>=1.21.0 |
|
|
opencv-python>=4.5.0 |
|
|
matplotlib>=3.5.0 |
|
|
Pillow>=8.3.0 |
|
|
pygame>=2.0.0 |
|
|
``` |
|
|
|
|
|
### README.md (Simplified version) |
|
|
```markdown |
|
|
# PyQt5 Dueling DQN Mario Tutorial |
|
|
|
|
|
An interactive desktop application that teaches Dueling Deep Q-Networks through Super Mario Bros implementation. |
|
|
|
|
|
## Quick Start |
|
|
```bash |
|
|
pip install -r requirements.txt |
|
|
python duel_dqn_tutorial.py |
|
|
``` |
|
|
|
|
|
## Features |
|
|
- Interactive PyQt5 GUI |
|
|
- 8 comprehensive tutorial sections |
|
|
- Hands-on coding exercises |
|
|
- Progress tracking |
|
|
- Visual learning aids |
|
|
|
|
|
## License |
|
|
MIT |
|
|
``` |
|
|
|
|
|
This model card provides comprehensive documentation for your educational application and follows Hugging Face's best practices for model documentation. It clearly communicates that this is an educational tool rather than a traditional pre-trained model, while still providing all the necessary information for users to understand and use your application effectively. |