--- language: en license: apache-2.0 library_name: jax tags: - deep-reinforcement-learning - neuroevolution - tpu - micro-models datasets: - synthetic-life-sim metrics: - survival-time - mass-optimization --- ## Micro-DeepRL (Gen 126k) **Micro-DeepRL** is a high-efficiency, neuroevolutionary "Life-Sim" agent. It is specifically optimized for **extreme resource-constrained environments** (e.g., Google Cloud `e2-micro`, Raspberry Pi, or 1vCPU legacy hardware), specifically aiming to reach ~2 steps/sec on 1 vCPU and 1GB of RAM. ## πŸ“Š Model Profile | Feature | Specification | | :--- | :--- | | **Model Architecture** | 3-Layer Convolutional Neural Network | | **Parameters** | ~2,300 (~8.6 KiB) | | **Grid Resolution** | 32x32 Cellular Automata | | **Input Channels** | 6 (Life, Food, Lava, 3x Signaling) | | **Training Time** | 126,000+ Generations | | **Compute** | 16x Google Cloud TPU v5e (TRC Program) | ## 🧬 Evolution & Training The model was trained using **JAX** and **Keras 3** on a TPU pod. The goal of the agent is to maximize "Life" pixels while navigating a grid containing food (rewards) and lava (punishments). - **Generation 0-10k:** Discovery of basic movement. - **Generation 10k-50k:** Mastery of "Lava Avoidance." - **Generation 50k-126k:** Optimization of "Food Camping" and mass stabilization. - **Final State:** Reached a stable equilibrium loss of approximately **-3040.00**. ## πŸš€ Deployment (Inference) This model is distributed as a raw **NumPy (`.npy`)** array to eliminate heavy framework dependencies (TensorFlow/PyTorch). It is designed to be "plug-and-play" with a standard NumPy-based inference script. ### Hardware Target: "Potato Tier" - **CPU:** 1vCPU (Shared core) - **RAM:** < 1GB - **Latency:** Near-zero inference on standard x86/ARM hardware. ## πŸ—ΊοΈ Roadmap: The 11-Day Sprint This model is part of a larger scaling experiment conducted over 11 days of TPU quota: 1. **Micro-DeepRL** (Gen 126k) - **Current** (32x32 Grid, 24 Filters) 2. **DeepRL-Standard** - **Active** (64x64 Grid, 64 Filters, Metabolism Decay) 3. **DeepRL-Large** - **Planned** (128x128 Grid, ResNet-style Blocks, T4 GPU Target) 4. **DeepRL-Max** - **Planned** (Transformer/Attention-based Strategist, H200 Target) ## πŸ› οΈ How to Load ```python import numpy as np # Load the 8.6KB DNA payload # Note: Use allow_pickle=True as weights are stored as a ragged object array dna = np.load("microDeepRL.npy", allow_pickle=True) # Parameter shapes: # [0] Conv2D_1 Kernel (3,3,6,24) # [1] Conv2D_1 Bias (24,) # [2] Conv2D_2 Kernel (1,1,24,24) # [3] Conv2D_2 Bias (24,) # [4] Output Kernel (1,1,24,6) # [5] Output Bias (6,)