Spaces:

antonisbast
/

DQN-SpaceInvaders

Sleeping

App Files Files Community

antonisbast commited on Mar 10

Commit

af63b78

1 Parent(s): 8c7f4b9

Add training information tab to frontend

Browse files

Files changed (1) hide show

app.py +116 -30

app.py CHANGED Viewed

@@ -245,47 +245,133 @@ with gr.Blocks(
         Watch trained DQN agents play Atari Space Invaders in real-time.
         Three variants trained from raw pixels using PyTorch.
-        **Pick an agent and hit Play!**
         """
     )
-    with gr.Row():
-        with gr.Column(scale=1):
-            variant_dropdown = gr.Dropdown(
-                choices=list(CHECKPOINTS.keys()),
-                value="Double DQN (avg: 650.20) ⭐ Best",
-                label="Agent Variant",
-            )
-            seed_input = gr.Number(
-                value=42,
-                label="Random Seed",
-                info="Change for a different game",
-                precision=0,
             )
-            play_btn = gr.Button("▶  Play Game", variant="primary", size="lg")
-            result_text = gr.Markdown("")
             gr.Markdown(
                 """
                 ---
-                **About the agents:**
-                - **Baseline DQN** — Standard architecture
-                - **Double DQN** ⭐ — Reduces Q-value overestimation
-                - **Dueling DQN** — Separates state value from action advantage
-                [GitHub](https://github.com/antonisbast/DQN-SpaceInvaders)
-                """
-            )
-        with gr.Column(scale=2):
-            video_output = gr.Video(label="Gameplay", autoplay=True)
-    play_btn.click(
-        fn=run_demo,
-        inputs=[variant_dropdown, seed_input],
-        outputs=[video_output, result_text],
-    )
 if __name__ == "__main__":
     demo.launch()

         Watch trained DQN agents play Atari Space Invaders in real-time.
         Three variants trained from raw pixels using PyTorch.
         """
     )
+    with gr.Tabs():
+        with gr.TabItem("🎮 Play Game"):
+            with gr.Row():
+                with gr.Column(scale=1):
+                    variant_dropdown = gr.Dropdown(
+                        choices=list(CHECKPOINTS.keys()),
+                        value="Double DQN (avg: 650.20) ⭐ Best",
+                        label="Agent Variant",
+                    )
+                    seed_input = gr.Number(
+                        value=42,
+                        label="Random Seed",
+                        info="Change for a different game",
+                        precision=0,
+                    )
+                    play_btn = gr.Button("▶  Play Game", variant="primary", size="lg")
+                    result_text = gr.Markdown("")
+                    gr.Markdown(
+                        """
+                        ---
+                        **About the agents:**
+                        - **Baseline DQN** — Standard architecture
+                        - **Double DQN** ⭐ — Reduces Q-value overestimation
+                        - **Dueling DQN** — Separates state value from action advantage
+                        """
+                    )
+                with gr.Column(scale=2):
+                    video_output = gr.Video(label="Gameplay", autoplay=True)
+            play_btn.click(
+                fn=run_demo,
+                inputs=[variant_dropdown, seed_input],
+                outputs=[video_output, result_text],
             )
+        with gr.TabItem("📊 Training Info"):
             gr.Markdown(
                 """
+                ## Training Results
+                All three variants exceeded 490+ average score over 100 consecutive episodes.
+                | Variant | Avg Score | Best Score | Episodes | Training Time |
+                |---------|-----------|-----------|----------|---------------|
+                | Baseline DQN | 524.75 | 586.45 | 1,470 | 7,000 |
+                | Double DQN | 650.20 | 650.20 | 1,355 | 6,090 |
+                | Dueling DQN | 497.55 | 647.05 | 1,465 | 7,000 |
+                **Key Finding:** Double DQN achieved the highest sustained performance with zero degradation between best and final averages. Dueling DQN reached the highest peak (647) but exhibited catastrophic forgetting in extended training — demonstrating why model checkpointing is critical in RL.
                 ---
+                ## Network Architecture
+                All variants share a convolutional backbone from Mnih et al. (2015):
+                | Layer | Filters | Kernel | Stride | Output |
+                |-------|---------|--------|--------|--------|
+                | Conv1 | 32 | 8×8 | 4 | 20×20×32 |
+                | Conv2 | 64 | 4×4 | 2 | 9×9×64 |
+                | Conv3 | 64 | 3×3 | 1 | 7×7×64 |
+                **Variant-specific heads:**
+                - **Baseline DQN:** Standard single-stream fully connected head → Q-values
+                - **Double DQN:** Same architecture, but decouples action selection (local network) from evaluation (target network) to reduce overestimation bias
+                - **Dueling DQN:** Splits into Value and Advantage streams — Q(s,a) = V(s) + A(s,a) - mean(A) — allowing the network to learn state quality independently of action choice
+                ---
+                ## Preprocessing Pipeline
+                Raw Atari frames (210×160×3) are converted into CNN-ready tensors (4×84×84), reducing input dimensionality by 72% while preserving all gameplay information:
+                1. Convert RGB → Grayscale
+                2. Crop to game region (20:200)
+                3. Resize to 84×84
+                4. Stack 4 consecutive frames
+                5. Normalize to [0, 1]
+                ---
+                ## Training Configuration
+                | Config | Baseline | Double | Dueling |
+                |--------|----------|--------|---------|
+                | Learning Rate | 1e-4 | 1.5e-4 | 1e-4 |
+                | Epsilon Decay | 0.9993 | 0.9992 | 0.9995 |
+                | Batch Size | 32 | 32 | 32 |
+                | Gamma (Discount) | 0.99 | 0.99 | 0.99 |
+                **Design rationale:**
+                - Double DQN uses a higher learning rate because its more conservative Q-estimates can tolerate faster learning without divergence
+                - Dueling DQN uses slower epsilon decay to benefit from extended exploration during value/advantage stream learning
+                ---
+                ## Key Components
+                - **Experience Replay Buffer** (100K transitions): Breaks temporal correlation and enables sample reuse
+                - **Target Network** with soft updates (τ=0.001): Stabilizes Q-value targets
+                - **ε-Greedy Exploration:** Decays from 1.0 → 0.01 at variant-specific rates
+                - **Gradient Clipping** (max norm 10): Prevents exploding gradients from large TD errors
+                ---
+                ## Key Findings
+                1. **Low learning rate is essential** — Standard 1e-3 causes divergence in Atari; 1e-4 provides stable convergence
+                2. **Double DQN is the most stable** — Zero gap between best and final averages; the only variant to reach its extended target
+                3. **Longer training ≠ better** — All variants showed diminishing returns or degradation past ~6,000 episodes
+                4. **Checkpointing matters** — Dueling DQN's peak performance (647) was 150 points above its final average; the best policy is not always the last one
+                ---
+                ## References
+                - Mnih, V. et al. (2015). [Human-level control through deep reinforcement learning](https://www.nature.com/articles/nature14236). Nature, 518(7540).
+                - Van Hasselt, H. et al. (2016). [Deep Reinforcement Learning with Double Q-learning](https://arxiv.org/abs/1509.06461). AAAI.
+                - Wang, Z. et al. (2016). [Dueling Network Architectures for Deep RL](https://arxiv.org/abs/1511.06581). ICML.
+                [GitHub Repository](https://github.com/antonisbast/DQN-SpaceInvaders)
+                """
+            )
 if __name__ == "__main__":
     demo.launch()