--- license: mit --- Here's a comprehensive Hugging Face Model Card for your PyQt5 Dueling DQN Mario Tutorial: ```markdown --- library_name: pytorch tags: - reinforcement-learning - dueling-dqn - super-mario-bros - pytorch - pyqt5 - tutorial - educational - interactive-learning --- # PyQt5 Dueling DQN Mario Tutorial - Interactive Learning Application ## Model Overview An interactive PyQt5 desktop application that provides a comprehensive tutorial for implementing Dueling Deep Q-Networks to play Super Mario Bros. This educational tool combines theoretical explanations with hands-on coding exercises to teach reinforcement learning concepts. ![Screenshot 2025-11-07 at 1.03.27โ€ฏPM](https://cdn-uploads.huggingface.co/production/uploads/68401f649e3f451260c68974/NhMi-ZlYJr3gua4opTStP.png) ## ๐ŸŽฏ What is this? This is not a traditional ML model, but an **interactive educational application** built with PyQt5 that teaches you how to implement Dueling DQN from scratch. It's designed for learners who want to understand reinforcement learning through practical implementation. ## โœจ Features - **Interactive Tutorial Interface**: Beautiful PyQt5 GUI with navigation and progress tracking - **Comprehensive Theory**: Detailed explanations of Dueling DQN architecture and mathematics - **Hands-on Exercises**: 8 coding exercises covering all implementation aspects - **Progress Tracking**: Visual progress indicators and completion metrics - **Code Validation**: Interactive code execution and solution checking - **Visual Learning**: Architecture diagrams and training visualizations ## ๐Ÿ—๏ธ Architecture ### Dueling DQN Components Covered: 1. **Environment Setup** - Super Mario Bros environment with preprocessing 2. **Replay Memory** - Experience replay buffer implementation 3. **Neural Network** - Dueling architecture with separate value/advantage streams 4. **Training Algorithm** - DQN with target networks and epsilon-greedy exploration 5. **Reward Shaping** - Advanced reward transformation techniques 6. **Model Persistence** - Checkpoint saving and loading 7. **Hyperparameter Tuning** - Configuration management system 8. **Evaluation Metrics** - Comprehensive training analysis ### Network Architecture: ```python DuelingDQN( (conv1): Conv2d(4, 32, kernel_size=8, stride=4) (conv2): Conv2d(32, 64, kernel_size=3, stride=1) (fc_adv): Linear(20736, 512) # Advantage stream (fc_val): Linear(20736, 512) # Value stream (advantage): Linear(512, n_actions) (value): Linear(512, 1) ) ``` ## ๐Ÿš€ Quick Start ### Installation ```bash # Clone the repository git clone https://github.com/TroglodyteDerivations/dueling-dqn-mario-tutorial.git cd dueling-dqn-mario-tutorial # Install dependencies pip install -r requirements.txt # Run the application python duel_dqn_tutorial.py ``` ### Requirements ```txt torch>=1.9.0 gym-super-mario-bros>=7.3.0 nes-py>=8.1.0 PyQt5>=5.15.0 numpy>=1.21.0 opencv-python>=4.5.0 matplotlib>=3.5.0 ``` ## ๐Ÿ“š Tutorial Structure ### 8 Comprehensive Sections: 1. **Introduction** - Overview and setup 2. **Dueling DQN Theory** - Mathematical foundations 3. **Environment Setup** - Super Mario Bros configuration 4. **Replay Memory** - Experience buffer implementation 5. **Neural Network** - Dueling architecture build 6. **Training Algorithm** - DQN training loop 7. **Complete Implementation** - Full system integration 8. **Exercises** - Hands-on coding challenges ### 8 Interactive Exercises: 1. Replay Memory Implementation 2. Dueling DQN Model Architecture 3. Environment Wrapper 4. Training Loop with Epsilon-Greedy 5. Reward Shaping Functions 6. Model Saving/Loading System 7. Hyperparameter Configuration 8. Evaluation Metrics System ## ๐ŸŽฎ Environment Details **Game**: Super Mario Bros (NES) **Action Space**: 12 complex movements **Observation**: 4 stacked frames (84x84 grayscale) **Reward Structure**: Distance, coins, enemies, level completion ### Action Space (COMPLEX_MOVEMENT): ```python ['NOOP', 'RIGHT', 'RIGHT+A', 'RIGHT+B', 'RIGHT+A+B', 'A', 'LEFT', 'LEFT+A', 'LEFT+B', 'LEFT+A+B', 'DOWN', 'UP'] ``` ## ๐Ÿง  Dueling DQN Theory ### Key Innovation: ```python Q(s,a) = V(s) + A(s,a) - mean(A(s,ยท)) ``` **Benefits over Standard DQN**: - Better action generalization - More stable learning - Faster convergence - Separate state value and action advantage learning ## โš™๏ธ Training Configuration ```python # Default Hyperparameters learning_rate = 0.0001 gamma = 0.99 batch_size = 32 buffer_size = 10000 epsilon_start = 1.0 epsilon_end = 0.01 epsilon_decay = 0.995 target_update = 1000 ``` ## ๐Ÿ“Š Performance ### Expected Learning Progress: - **Episodes 0-1000**: Basic movement learning - **Episodes 1000-5000**: Enemy avoidance and coin collection - **Episodes 5000+**: Level navigation and completion ### Sample Training Output: ``` cuda | Episode: 100 | Score: 256.8 | Loss: 1.23 | Stage: 1-1 cuda | Episode: 500 | Score: 512.1 | Loss: 0.87 | Stage: 1-2 cuda | Episode: 1000 | Score: 890.4 | Loss: 0.45 | Stage: 2-1 ``` ## ๐Ÿ› ๏ธ Usage Examples ### Running the Tutorial: ```python from duel_dqn_tutorial import DuelingDQNTutorialApp import sys from PyQt5.QtWidgets import QApplication app = QApplication(sys.argv) window = DuelingDQNTutorialApp() window.show() sys.exit(app.exec_()) ``` ### Training a Model: ```python from mario_dqn import MarioDQNAgent agent = MarioDQNAgent() scores = agent.train(episodes=10000) agent.save_model('mario_dqn_final.pth') ``` ## ๐ŸŽฏ Educational Value This tutorial helps you understand: - **Reinforcement Learning Fundamentals**: MDP, Q-learning, policy optimization - **Deep Q-Networks**: Value approximation with neural networks - **Dueling Architecture**: Value/advantage decomposition theory - **Experience Replay**: Importance of uncorrelated training samples - **Target Networks**: Stabilizing training with delayed updates - **Reward Engineering**: Shaping rewards for better learning - **Hyperparameter Tuning**: Systematic configuration optimization ## ๐Ÿ“ Project Structure ``` dueling-dqn-mario-tutorial/ โ”œโ”€โ”€ duel_dqn_tutorial.py # Main PyQt5 application โ”œโ”€โ”€ mario_dqn.py # DQN implementation โ”œโ”€โ”€ wrappers.py # Environment wrappers โ”œโ”€โ”€ models/ # Saved model checkpoints โ”œโ”€โ”€ exercises/ # Exercise solutions โ”œโ”€โ”€ requirements.txt # Dependencies โ””โ”€โ”€ README.md # This file ``` ## ๐Ÿค Contributing We welcome contributions! Areas for improvement: - Additional exercise variations - More visualization tools - Performance optimizations - Additional game environments - Multi-agent implementations ## ๐Ÿ“œ Citation If you use this tutorial in your research or teaching, please cite: ```bibtex @software{dueling_dqn_mario_tutorial, title = {PyQt5 Dueling DQN Mario Tutorial}, author = {Martin Rivera}, year = {2025}, url = {https://huggingface.co/TroglodyteDerivations/Interactive_Dueling_DQN_Mario_Tutorial/edit/main/README.md} } ``` ## ๐Ÿ“„ License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. ## ๐Ÿ™ Acknowledgments - Nintendo for Super Mario Bros - OpenAI Gym for the reinforcement learning framework - PyTorch team for the deep learning framework - PyQt5 team for the GUI framework - Flux.1-krea.dev for architecture visualizations --- **Happy Learning!** ๐ŸŽฎโœจ *Master reinforcement learning by building an AI that can play Super Mario Bros!* ``` ## Additional Files for Your Repository: ### requirements.txt ```txt torch>=1.9.0 gym-super-mario-bros>=7.3.0 nes-py>=8.1.0 PyQt5>=5.15.0 numpy>=1.21.0 opencv-python>=4.5.0 matplotlib>=3.5.0 Pillow>=8.3.0 pygame>=2.0.0 ``` ### README.md (Simplified version) ```markdown # PyQt5 Dueling DQN Mario Tutorial An interactive desktop application that teaches Dueling Deep Q-Networks through Super Mario Bros implementation. ## Quick Start ```bash pip install -r requirements.txt python duel_dqn_tutorial.py ``` ## Features - Interactive PyQt5 GUI - 8 comprehensive tutorial sections - Hands-on coding exercises - Progress tracking - Visual learning aids ## License MIT ``` This model card provides comprehensive documentation for your educational application and follows Hugging Face's best practices for model documentation. It clearly communicates that this is an educational tool rather than a traditional pre-trained model, while still providing all the necessary information for users to understand and use your application effectively.